Tag Archives: Opinions

Every Serialization Framework Should Have Its Own Transient Annotation

Post Syndicated from Bozho original https://techblog.bozho.net/every-serialization-framework-should-have-its-own-transient-annotation/

We’ve all used dozens of serialization frameworks – for JSON, XML, binary, and ORMs (which are effectively serialization frameworks for relational databases). And there’s always the moment when you need to exclude some field from an object – make it “transient”.

So far so good, but then comes the point where one object is used by several serialization frameworks within the same project/runtime. That’s not necessarily the case, but let me discuss the two alternatives first:

  • Use the same object for all serializations (JSON/XML for APIs, binary serialization for internal archiving, ORM/database) – preferred if there are only minor differences between the serialized/persisted fields. Using the same object saves a lot of tedious transferring between DTOs.
  • Use different DTOs for different serializations – that becomes a necessity when scenarios become more complex and using the same object becomes a patchwork of customizations and exceptions

Note that both strategies can exist within the same project – there are simple objects and complex objects, and you can only have a variety of DTOs for the latter. But let’s discuss the first option.

If each serialization framework has its own “transient” annotation, it’s easy to tweak the serialization of one or two fields. More importantly, it will have predictable behavior. If not, then you may be forced to have separate DTOs even for classes where one field differs in behavior across the serialization targets.

For example the other day I had the following surprise – we use Java binary serialization (ObjectOutputStream) for some internal buffering of large collections, and the objects are then indexed. In a completely separate part of the application, objects of the same class get indexed with additional properties that are irrelevant for the binary serialization and therefore marked with the Java transient modifier. It turns out, GSON respects the “transient” modifier and these fields are never indexed.

In conclusion, this post has two points. The first is – expect any behavior from serialization frameworks and have tests to verify different serialization scenarios. And the second is for framework designers – don’t reuse transient modifiers/annotations from the language itself or from other frameworks, it’s counterintuitive.

The post Every Serialization Framework Should Have Its Own Transient Annotation appeared first on Bozho's tech blog.

A Developer Running For Parliament

Post Syndicated from Bozho original https://techblog.bozho.net/a-developer-running-for-parliament/

That won’t be a typical publication you’d see on a developer’s blog. But yes, I’m running for parliament (in my country, Bulgaria, an EU member). And judging by the current polls for the party I’m with, I’ll make it.

But why? Well, I’ll refer to four previous posts in this blog to illustrate my decision.

First, I used to be a government advisor 4 years ago. So the “ship of public service” has sailed. What I didn’t realize back then was that in order to drive sustainable change in the digital realm of the public sector, you need to have a political debate about the importance and goals of those changes, not merely “ghost-writing” them.

A great strategy and a great law and even a great IT system is useless without the mental uptake by a sufficient amount of people. So, that’s the reason one has to be on the forefront of political debate in order to make sure digital transformation is done right. And this forefront is parliament. I’m happy to have supported my party as an expert for the past four years and that expertise is valued. That’s the biggest argument here – you need people like me, with deep technical knowledge and experience in many IT projects, to get things done right on every level. That’s certainly not a one-man task, though.

Second, it’s a challenge. I once wrong “What is challenging for developers” and the last point is “open ended problems”. Digitally transforming an entire country is certainly a challenge in the last category – “open ended problems”. There is no recipe, no manual for that.

Third, lawmaking is quite like programming (except it doesn’t regulate computer behavior, it regulates public life, which is far more complex and important). I already have a decent lawmaking experience and writing better, more precise and more “digital-friendly” laws is something that I like doing and something that I see as important.

Fourth, ethics has been important for me as a developer and it’s much more important for a politician.

For this blog it means I will be writing a bit more high-level stuff than day-to-day tips and advice. I hope I’ll still be able to (and sometimes have to) write some code in order to solve problems, but that won’t be enough material for blogposts. But I’ll surely share thoughts on cybersecurity, quality of public sector projects and system integration.

Software engineering and politics require very different skills. I think I am a good engineer (and I hope to remain so), and I have been a manager and a founder in the last couple of years as well. I’ve slowly, over time, developed my communication skills. National politics, even in a small country, is a tough feat, though. But as engineers we are constantly expanding our knowledge and skills, so I’ll try to transfer that mindset into a new realm.

The post A Developer Running For Parliament appeared first on Bozho's tech blog.

The Syslog Hell

Post Syndicated from Bozho original https://techblog.bozho.net/the-syslog-hell/

Syslog. You’ve probably heard about that, especially if you are into monitoring or security. Syslog is perceived to be the common, unified way that systems can send logs to other systems. Linux supports syslog, many network and security appliances support syslog as a way to share their logs. On the other side, a syslog server is receiving all syslog messages. It sounds great in theory – having a simple, common way to represent logs messages and send them across systems.

Reality can’t be further from that. Syslog is not one thing – there are multiple “standards”, and each of those is implemented incorrectly more often than not. Many vendors have their own way of representing data, and it’s all a big mess.

First, the RFCs. There are two RFCs – RFC3164 (“old” or “BSD” syslog) and RFC5424 (the new variant that obsoletes 3164). RFC3164 is not a standard, while RFC5424 is (mostly).

Those RFCs concern the contents of a syslog message. Then there’s RFC6587 which is about transmitting a syslog message over TCP. It’s also not a standard, but rather “an observation”. Syslog is usually transmitted over UDP, so fitting it into TCP requires some extra considerations. Now add TLS on top of that as well.

Then there are content formats. RFC5424 defines a key-value structure, but RFC 3164 does not – everything after the syslog header is just a non-structured message string. So many custom formats exist. For example firewall vendors tend to define their own message formats. At least they are often documented (e.g. check WatchGuard and SonicWall), but parsing them requires a lot of custom knowledge about that vendor’s choices. Sometimes the documentation doesn’t fully reflect the reality, though.

Instead of vendor-specific formats, there are also de-facto standards like CEF and the less popular LEEF. They define a structure of the message and are actually syslog-independent (you can write CEF/LEEF to a file). But when syslog is used for transmitting CEF/LEEF, the message should respect RFC3164.

And now comes the “fun” part – incorrect implementations. Many vendors don’t really respect those documents. They come up with their own variations of even the simplest things like a syslog header. Date formats are all over the place, hosts are sometimes missing, priority is sometimes missing, non-host identifiers are used in place of hosts, colons are placed frivolously.

Parsing all of that mess is extremely “hacky”, with tons of regexes trying to account for all vendor quirks. I’m working on a SIEM, and our collector is open source – you can check our syslog package. Some vendor-specific parsers are yet missing, but we are adding new ones constantly. The date formats in the CEF parser tell a good story.

If it were just two RFCs with one de-facto message format standard for one of them and a few option for TCP/UDP transmission, that would be fine. But what makes things hell is the fact that too many vendors decided not to care about what is in the RFCs, they decided that “hey, putting a year there is just fine” even though the RFC says “no”, that they don’t really need to set a host in the header, and that they didn’t really need to implement anything new after their initial legacy stuff was created.

Too many vendors (of various security and non-security software) came up with their own way of essentially representing key-value pairs, too many vendors thought their date format is the right one, too many vendors didn’t take the time to upgrade their logging facility in the past 12 years.

Unfortunately that’s representative of our industry (yes, xkcd). Someone somewhere stitches something together and then decades later we have an incomprehensible patchwork of stringly-typed, randomly formatted stuff flying around whatever socket it finds suitable. And it’s never the right time and the right priority to clean things up, to get up to date, to align with others in the field. We, as an industry (both security and IT in general) are creating a mess out of everything. Yes, the world is complex, and technology is complex as well. Our job is to make it all palpable, abstracted away, simplified and standardized. And we are doing the opposite.

The post The Syslog Hell appeared first on Bozho's tech blog.

Developers Are Obsessed With Their Text Editors

Post Syndicated from Bozho original https://techblog.bozho.net/developers-are-obsessed-with-their-text-editors/

Developers are constantly discussing and even fighting about text editors and IDEs. Which one is better, why is it better, what’s the philosophy behind one or the other, which one makes you more productive, which one has better themes, which one is more customizable.

I myself have fallen victim to this trend, with several articles about why Emacs is not a good idea for Java, why I still use Eclipse (though I’d still prefer some IDEA features), and what’s the difference between an editor and an IDE (for those who’d complain about the imprecise title of this post).

Are text editors and IDEs important? Sure, they are one of our main tools that we use everyday and therefore it should be very, very good (more metaphors about violin players and tennis players, please). But most text editors and IDEs are good. They evolve, they copy each other, they attract their audiences. They are also good in different ways, but most of the top ones achieve their goal (otherwise they wouldn’t be so popular). Sure, someone prefers a certain feature to be implemented in a certain way, or demands having another feature (e.g. I demand having call hierarchies on all constructors and IDEA doesn’t give me that, duh…) But those things are rarely significant in the grand scheme of things.

The comparable insignificance comes from the structure of our work, or why we are being now often called “software engineers” – it’s not about typing speed, or the perfectly optimized tool for creating code. Our time is dedicated to thinking, designing, reading, naming things. And the quality of our code writing/editing/debugging tool is not on the top of the list of things that drive productivity and quality.

We should absolutely master our tools, though. Creating software requires much more than editing text. Refactoring, advanced search, advanced code navigation, debugging, hot-swap/hot-deploy/reload-on-save, version control integration – all of these things are important for doing our job better.

My point is that text editors or IDEs occupy too much of developers’ time and mind, with too little benefit. Next time you think it’s a good idea to argue about which editor/IDE a colleague SHOULD be using, think twice. It’s not a good investment of your time and energy. And next time you consider standardizing on an editor/IDE for the whole team, don’t. Leave people with their preference, it doesn’t affect team consistency.

The post Developers Are Obsessed With Their Text Editors appeared first on Bozho's tech blog.

Releasing Often Helps With Analyzing Performance Issues

Post Syndicated from Bozho original https://techblog.bozho.net/releasing-often-helps-with-analyzing-performance-issues/

Releasing often is a good thing. It’s cool, and helps us deliver new functionality quickly, but I want to share one positive side-effect – it helps with analyzing production performance issues.

We do releases every 5 to 10 days and after a recent release, the application CPU chart jumped twice (the lines are differently colored because we use blue-green deployment):

What are the typical ways to find performance issues with production loads?

  • Connect a profiler directly to production – tricky, as it requires managing network permissions and might introduce unwanted overhead
  • Run performance tests against a staging or local environment and do profiling there – good, except your performance tests might not hit exactly the functionality that causes the problem (this is what happens in our case, as it was some particular types of API calls that caused it, which weren’t present in our performance tests). Also, performance tests can be tricky
  • Do a thread dump (and heap dump) and analyze them locally – a good step, but requires some luck and a lot of experience analyzing dumps, even if equipped with the right tools
  • Check your git history / release notes for what change might have caused it – this is what helped us resolve the issue. And it was possible because there were only 10 days of commits between the releases.

We could go through all of the commits and spot potential performance issues. Most of them turned out not to be a problem, and one seemingly unproblematic pieces was discovered to be the problem after commenting it out for a brief period a deploying a quick release without it, to test the hypothesis. I’ll share a separate post about the particular issue, but we would have to waste a lot more time if that release has 3 months worth of commits rather than 10 days.

Sometimes it’s not an obvious spike in the CPU or memory, but a more gradual issue that you introduce at some point and it starts being a problem a few months later. That’s what happened a few months ago, when we noticed a stead growth in the CPU with the growth of ingested data. Logical in theory, but the CPU usage grew faster than the data ingestion rate, which isn’t good.

So we were able to answer the question “when did it start growing” in order to be able to pinpoint the release that introduced the issue. because the said release had only 5 days of commits, it was much easier to find the culprit.

All of the above techniques are useful and should be employed at the right time. But releasing often gives you a hand with analyzing where a performance issues is coming from.

The post Releasing Often Helps With Analyzing Performance Issues appeared first on Bozho's tech blog.

Let’s Kill Security Questions

Post Syndicated from Bozho original https://techblog.bozho.net/lets-kill-security-questions/

Let’s kill security questions

Security questions still exist. They are less dominant now, but we haven’t yet condemned them as an industry hard enough so that they stop being added to authentication flows.

But they are bad. They are like passwords, but more easily guessable, because you have a password hint. And while there are opinions that they might be okay in certain scenarios, they have so many pitfalls that in practice we should just not consider them an option.

What are those pitfalls? Social engineering. Almost any security question’s answer is guessable by doing research on the target person online. We share more about our lives and don’t even realize how that affects us security-wise. Many security questions have a limited set of possible answers that can be enumerated with a brute force attack (e.g. what are the most common pet names; what are the most common last names in a given country for a given period of time, in order to guess someone’s mother’s maiden name; what are the high schools in the area where the person lives, and so on). So when someone wants to takeover your account, if all they have to do is open your Facebook profile or try 20-30 options, you have no protection.

But what are they for in the first place? Account recovery. You have forgotten your password and the system asks you some details about you to allow you to reset your password. We already have largely solved the problem of account recovery – send a reset password link to the email of the user. If the system itself is an email service, or in a couple of other scenarios, you can use a phone number, where a one-time password is sent for recovery purposes (or a secondary email, for email providers).

So we have the account recovery problem largely solved, why are security questions still around? Inertia, I guess. And the five monkeys experiment. There is no good reason to have a security question if you can have recovery email or phone. And you can safely consider that to be true (ok, maybe there are edge cases).

There are certain types of account recovery measures that resemble security questions and can be implemented as an additional layer, on top of a phone or email recovery. For more important services (e.g. your Facebook account or your main email), it may not be safe to consider just owning the phone or just having access to the associated email to be enough. Phones get stolen, emails get “broken into”. That’s why a security-like set of questions may serve as additional protection. For example – guessing recent activity. Facebook does that sometimes by asking you about your activity on the site or about your friends. This is not perfect, as it can be monitored by the malicious actor, but is an option. For your email, you can be asked what are the most recent emails that you’ve sent, and be presented with options to choose from, with some made up examples. These things are hard to implement because of geographic and language differences, but “guess your recent activity among these choices”, e.g. dynamically defined security questions, may be an acceptable additional step for account recovery.

But fixed security questions – no. Let’s kill those. I’m not the first to argue against security questions, but we need to be reminded that certain bad security practices should be left in the past.

Authentication is changing. We are desperately trying to get rid of the password itself (and still failing to do so), but before we manage to do so, we should first get rid of the “bad password in disguise”, the security question.

The post Let’s Kill Security Questions appeared first on Bozho's tech blog.

Discovering an OSSEC/Wazuh Encryption Issue

Post Syndicated from Bozho original https://techblog.bozho.net/discovering-an-ossec-wazuh-encryption-issue/

I’m trying to get the Wazuh agent (a fork of OSSEC, one of the most popular open source security tools, used for intrusion detection) to talk to our custom backend (namely, our LogSentinel SIEM Collector) to allow us to reuse the powerful Wazuh/OSSEC functionalities for customers that want to install an agent on each endpoint rather than just one collector that “agentlessly” reaches out to multiple sources.

But even though there’s a good documentation on the message format and encryption, I couldn’t get to successfully decrypt the messages. (I’ll refer to both Wazuh and OSSEC, as the functionality is almost identical in both, with the distinction that Wazuh added AES support in addition to blowfish)

That lead me to a two-day investigation on possible reasons. The first side-discovery was the undocumented OpenSSL auto-padding of keys and IVs described in my previous article. Then it lead me to actually writing C code (an copying the relevant Wazuh/OSSEC pieces) in order to debug the issue. With Wazuh/OSSEC I was generating one ciphertext and with Java and openssl CLI – a different one.

I made sure the key, key size, IV and mode (CBC) are identical. That they are equally padded and that OpenSSL’s EVP API is correctly used. All of that was confirmed and yet there was a mismatch, and therefore I could not decrypt the Wazuh/OSSEC message on the other end.

After discovering the 0-padding, I also discovered a mistake in the documentation, which used a static IV of FEDCA9876543210 rather than the one found in the code, where the 0 preceded 9 – FEDCA0987654321. But that didn’t fix the issue either, only got me one step closer.

A side-note here on IVs – Wazuh/OSSEC is using a static IV, which is a bad practice. The issue is reported 5 years ago, but is minor, because they are using some additional randomness per message that remediates the use of a static IV; it’s just not idiomatic to do it that way and may have unexpected side-effects.

So, after debugging the C code, I got to a simple code that could be used to reproduce the issue and asked a question on Stackoverflow. 5 minutes after posting the question I found another, related question that had the answer – using hex strings like that in C doesn’t work. Instead, they should be encoded: char *iv = (char *)"\xFE\xDC\xBA\x09\x87\x65\x43\x21\x00\x00\x00\x00\x00\x00\x00\x00";. So, the value is not the bytes corresponding to the hex string, but the ASCII codes of each character in the hex string. I validated that in the receiving Java end with this code:

This has an implication on the documentation, as well as on the whole scheme as well. Because the Wazuh/OSSEC AES key is: MD5(password) + MD5(MD5(agentName) + MD5(agentID)){0, 15}, the 2nd part is practically discarded, because the MD5(password) is 32 characters (= 32 ASCII codes/bytes), which is the length of the AES key. This makes the key derived from a significantly smaller pool of options – the permutations of 16 bytes, rather than of 256 bytes.

I raised an issue with Wazuh. Although this can be seen as a vulnerability (due to the reduced key space), it’s rather minor from security point of view, and as communication is mostly happening within the corporate network, I don’t think it has to be privately reported and fixed immediately.

Yet, I made a recommendation for introducing an additional configuration option to allow to transition to the updated protocol without causing backward compatibility issues. In fact, I’d go further and recommend using TLS/DTLS rather than a home-grown, AES-based scheme. Mutual authentication can be achieved through TLS mutual authentication rather than through a shared secret.

It’s satisfying to discover issues in popular software, especially when they are not written in your “native” programming language. And as a rule of thumb – encodings often cause problems, so we should be extra careful with them.

The post Discovering an OSSEC/Wazuh Encryption Issue appeared first on Bozho's tech blog.

Is It Really Two-Factor Authentication?

Post Syndicated from Bozho original https://techblog.bozho.net/is-it-really-two-factor-authentication/

Terminology-wise, there is a clear distinction between two-factor authentication (multi-factor authentication) and two-step verification (authentication), as this article explains. 2FA/MFA is authentication using more than one factors, i.e. “something you know” (password), “something you have” (token, card) and “something you are” (biometrics). Two-step verification is basically using two passwords – one permanent and another one that is short-lived and one-time.

At least that’s the theory. In practice it’s more complicated to say which authentication methods belongs to which category (“something you X”). Let me illustrate that with a few emamples:

  • An OTP hardware token is considered “something you have”. But it uses a shared symmetric secret with the server so that both can generate the same code at the same time (if using TOTP), or the same sequence. This means the secret is effectively “something you know”, because someone may steal it from the server, even though the hardware token is protected. Unless, of course, the server stores the shared secret in an HSM and does the OTP comparison on the HSM itself (some support that). And there’s still a theoretical possibility for the keys to leak prior to being stored on hardware. So is a hardware token “something you have” or “something you know”? For practical purposes it can be considered “something you have”
  • Smartphone OTP is often not considered as secure as a hardware token, but it should be, due to the secure storage of modern phones. The secret is shared once during enrollment (usually with on-screen scanning), so it should be “something you have” as much as a hardware token
  • SMS is not considered secure and often given as an example for 2-step verification, because it’s just another password. While that’s true, this is because of a particular SS7 vulnerability (allowing the interception of mobile communication). If mobile communication standards were secure, the SIM card would be tied to the number and only the SIM card holder would be able to receive the message, making it “something you have”. But with the known vulnerabilities, it is “something you know”, and that something is actually the phone number.
  • Fingerprint scanners represent “something you are”. And in most devices they are built in a way that the scanner authenticates to the phone (being cryptographically bound to the CPU) while transmitting the fingerprint data, so you can’t just intercept the bytes transferred and then replay them. That’s the theory; it’s not publicly documented how it’s implemented. But if it were not so, then “something you are” is “something you have” – a sequence of bytes representing your fingerprint scan, and that can leak. This is precisely why biometric identification should only be done locally, on the phone, without any server interaction – you can’t make sure the server is receiving sensor-scanned data or captured and replayed data. That said, biometric factors are tied to the proper implementation of the authenticating smartphone application – if your, say, banking application needs a fingerprint scan to run, a malicious actor should not be able to bypass that by stealing shared credentials (userIDs, secrets) and do API calls to your service. So to the server there’s no “something you are”. It’s always “something that the client-side application has verified that you are, if implemented properly”
  • A digital signature (via a smartcard or yubikey or even a smartphone with secure hardware storage for private keys) is “something you have” – it works by signing one-time challenges, sent by the server and verifying that the signature has been created by the private key associated with the previously enrolled public key. Knowing the public key gives you nothing, because of how public-key cryptography works. There’s no shared secret and no intermediary whose data flow can be intercepted. A private key is still “something you know”, but by putting it in hardware it becomes “something you have”, i.e. a true second factor. Of course, until someone finds out that the random generation of primes used for generating the private key has been broken and you can derive the private key form the public key (as happened recently with one vendor).

There isn’t an obvious boundary between theoretical and practical. “Something you are” and “something you have” can eventually be turned into “something you know” (or “something someone stores”). Some theoretical attacks can become very practical overnight.

I’d suggest we stick to calling everything “two-factor authentication”, because it’s more important to have mass understanding of the usefulness of the technique than to nitpick on the terminology. 2FA does not solve phishing, unfortunately, but it solves leaked credentials, which is good enough and everyone should have some form of it. Even SMS is better than nothing (obviously, for high-profile systems, digital signatures is the way to go).

The post Is It Really Two-Factor Authentication? appeared first on Bozho's tech blog.

Making Sense of the Information Security Landscape

Post Syndicated from Bozho original https://techblog.bozho.net/making-sense-of-the-information-security-landscape/

There are hundreds of different information security solutions out there and choosing which one to pick can be hard. Usually decisions are driven by recommendations, vendor familiarity, successful upsells, compliance needs, etc. I’d like to share my understanding of the security landscape by providing one-line descriptions of each of the different categories of products.

Note that these categories are not strictly defined sometimes and they may overlap. They may have evolved over time and a certain category can include several products from legacy categories. The explanations will be slightly simplified. For a generalization and summary, skip the list and go to the next paragraph. This post aims to summarize a lot of Gertner and Forester reports, as well as product data sheets, combined with some real world observations and to bring this to a technical level, rather than broad business-focused capabilities. I’ll split them in several groups, though they may be overlapping.

Monitoring and auditing

  • SIEM (Security Information and Event Management) – collects logs from all possible sources (applications, OSs, network appliances) and raises alarms if there are anomalies
  • IDS (Intrusion Detection System) – listening to network packets and finding malicious signatures or statistical anomalies. There are multiple ways to listen to the traffic: proxy, port mirroring, network tap, host-based interface listener. Deep packet inspection is sometimes involved, which requires sniffing TLS at the host or terminating it at a proxy in order to be able to inspect encrypted communication (especially for TLS 1.3), effectively doing an MITM “attack” on the organization users.
  • IPS (Intrusion Prevention System) – basically a marketing upgrade of IDS with the added option to “block” traffic rather than just “report” the intrusion.
  • UEBA (User and Entity Behavior Analytics) – a system that listens to system activity (via logs and/or directly monitoring endpoints for user and system activity, including via screen capture) that tries to identify user behavior patterns (as well as system component behavior patterns) and report on any anomalies and changes in the pattern, also classifying users as less or more “risky”. Recently UEBA has been part of next-gen SIEMs
  • SUBA (Security user Behavior Analytics) – same as UEBA, but named so after the purpose (security) rather than the entities monitored. Used by Forester (whereas UEBA is used by Gartner)
  • DAM (Database Activity Monitoring) – tools that monitor and log database queries and configuration changes, looking for suspicious patterns and potentially blocking them based on policies. Implemented via proxy or agents installed at the host
  • DAP (Database Audit and Protection) – based on DAM, but with added features for content classification (similar to DLPs), vulnerability detection and more clever behavior analysis (e.g. through UEBA)
  • FIM (File Integrity Monitoring) – usually a feature of other tools, FIM is constantly monitoring files for potentially suspicious changes
  • SOC (Security Operations Center) – this is more of an organizational unit that employs multiple tools (usually a SIEM, DLP, CASB) to fully handle the security of an organization.

Access proxies

  • CASB (Cloud Access Security Broker) – a proxy (usually) that organizations go through when connecting to cloud services that allow them to enforce security policies and detect anomalies, e.g. regarding authentication and authorization, input and retrieval of sensitive data. CASBs may involve additional encryption options for the data being used.
  • CSG (Cloud Security Gateway) – effectively the same as CASB
  • SWG (Secure Web Gateway) – a proxy for accessing the web, includes filtering malicious websites, filtering potentially malicious downloads, limiting uploads
  • SASE (Secure Access Service Edge) – like CASB/CSG, but also providing additional bundled functionalities like a Firewall, SWG, VPN, DNS management, etc.

Firewalls

  • WAF (Web Application Firewall) – a firewall (working as a reverse proxy) that you put in front of web applications to protect them from typical web vulnerabilities that may not be addressed by the application developer – SQL injections, XSS, CSRF, etc.
  • NF (Network Firewall) – the typical firewall that allows you to allow or block traffic based on protocol, port, source/destination
  • NGFW (Next Generation Firewall) – a firewall that combines both network firewall, (web) application firewall and providing analysis of the traffic thus detecting potential anomalies/intrusions/data exfiltration

Data protection

  • DLP (Data Leak Prevention / Data Loss Prevention) – that’s a broad category of tools that aim at preventing data loss – mostly accidental, but sometimes malicious as well. Sometimes involves installing an agent in each machine, in other case it’s proxy-based. Many other solutions provide DLP functionality, like IPS/IDS, WAFs, CASBs, but DLPs are focused on inspecting user activities (including via UEBA/SUBA), network traffic (including via SWGs), communication (most often email) and publicly facing storage (e.g. FTP, S3), that may lead to leaking data. DLPs include discovering sensitive data in structured (databases) and unstructured (office documents) data. Other DLP features are encryption of data at rest and tokenization of sensitive data.
  • ILDP (Information Leak Detection and Prevention) – same as DLP
  • IPC (Information Protection and Control) – same as DLP
  • EPS (Extrusion Prevention System) – same as DLP, focused on monitoring outbound traffic for exfiltration attempts
  • CMF (Content Monitoring and Filtering) – part of DLP. May overlap with SWG functionalities.
  • CIP (Critical Information Protection) – part of DLP, focused on critical information, e.g. through encryption and tokenization
  • CDP (Continuous Data Protection) – basically incremental/real-time backup management, with retention settings and possibly encryption

Vulnerability testing

  • RASP (Runtime Application Self-protection) – tools (usually in the form of libraries that are included in the application runtime) that monitor in real-time the application usage and can block certain actions (at binary level) or even shut down the application if a cyber attack is detected.
  • IASTInteractive Application Security Testing – Similar to RASP, the subtle difference being that IASP is usually used in pre-production environments while RASP is used in production
  • SAST (Static Application Security Testing) – tools that scan application source code for vulnerabilities
  • DAST (Dynamic Application Security Testing) – tools that scans web applications for vulnerabilities through their exposed HTTP endpoints
  • VA (Vulnerability assessment) – a process helped by many tools (including those above, and more) for finding, assessing and eliminating vulnerabilities

Identity and access

  • IAM (Identity and Access Management) – products that allow organizations to centralize authentication and enrollment of their users, providing single-sign-on capabilities, centralized monitoring authentication activity, applying access policies (e.g. working hours), enforcing 2FA, etc.
  • SSO – the ability to use the same credentials for logging into multiple (preferably all) applications in an organization.
  • WAM (Web Access Management) – the “older” version of IAM, lacking flexibility and some features like centralized user enrollment/provisioning
  • PAM (Privileged access management) – managing credentials of privileged users (e.g. system administrators). Instead of having admin credentials stored in local password managers (or worse – sticky notes or files on the desktop), credentials are stored in a centralized, protected vault and “released” for use only after a certain approval process for executing a given admin task, in some cases monitoring and logging the executed activities. The PAM handles regular password changes. It basically acts as a proxy (though not necessarily in the network sense) between a privileged user and a system that requires elevated privileges.

Endpoint protection

  • AV (Anti-Virus) – the good old antivirus software that gets malicious software signatures form a centrally managed blacklist and blocks programs that match those signatures
  • NGAV (Next Generation Anti-Virus) – going beyond signature matching, NGAV looks for suspicious activities (e.g. filesystem, memory, registry access/modification) and uses policies and rules to block such activity even from previously unknown and not yet blacklisted programs. Machine learning is usually said to be employed, but in many cases that’s mostly marketing.
  • EPP (Endpoint Protection Platform) – includes NGAV as well as a management layer that allows centrally provisioning and managing policies, reporting and workflows for remediation
  • EDR (Endpoint Detection and Response) – using an agent to collect endpoint (device) data, centralize it, combine it with network logs and analyze that in order to detect malicious activity. After suspected malicious activity is detected, allow centralized response, including blocking/shutting down/etc. Compared to NGAV, EDR makes use of the data across the organization, while NGAV usually focuses on individual machines, but that’s not universally true
  • ATP (Advanced threat protection) – same as EDR
  • ATD (Advanced threat detection) – same as above, with just monitoring and analytics capabilities

Coordination and automation

  • UTM (Unified Threat Management) – combining multiple monitoring and prevention tools in one suite (antivirus/NGAV/EDR), DLP, Firewalls, VPNs, etc. The benefit being that you purchase one thing rather than finding your way through the jungle described above. At least that’s on paper; in reality you still get different modules, sometimes not even properly integrated with each other.
  • SOAR (Security Orchestration, Automation and Response) – tools for centralizing security alerts and configuring automated actions in response. Alert fatigue is a real thing with many false positives generated by tools like SIEMs/DLPs/EDRs. Reducing those false alarms is often harder than just scripting the way they are handled. SOAR provides that – it ingests alerts and allows you to use pre-built or custom response “cookbooks” that include checking data (e.g. whether an IP is in some blacklist, are there attachments of certain content type in a flagged email, whether an employee is on holiday, etc.), creating tickets and alerting via multiple channels (email/sms/other type of push)
  • TIP (Threat Intelligence Platform) – threat intelligence is often part of other solutions like SIEMs, EDRs and DLPs and involves collecting information (intelligence) about certain resources like IP addresses, domain names, certificates. When these items are discovered in the collected logs, the TIP can enrich the event with what it knows about the given item and even act in order to block a request, if a threat threshold is reached. In short – scanning public and private databases to find information about malicious actors and their assets.

Email

  • SEG (Secure email gateway) – a proxy for all incoming and outgoing email that scans them for malicious attachments, potential phishing and in some cases data exfiltration attempts.
  • MFT (Managed File Transfer) – a tool that allows sharing files securely with someone by replacing attachments. Shared files can be tracked, monitored, audited and scanned for vulnerabilities, and access can be cut once the files was downloaded by the recipient, reducing the risk of data leaks.

DDoS

  • DDoS mitigation/protection – services that hide your actual IP in an attempt to block malicious DDoS traffic before it reaches your network (when it’s too late). They usually rely on large global networks an data centers (called “scrubbing centers”) to send clean traffic to your servers.

Compliance

  • GRC (Governance, Risk and Compliance) – a management tool for handling all the policies, audits, risk assessments, workflows and reports regarding different aspects of compliance, including security compliance
  • IRM – allegedly, philosophically different and more modern and advanced, in reality – the same as GRC with some additional monitoring features

So let’s summarize the ways that all of these solutions work:

  • Monitoring logs and other events
  • Inspecting incoming traffic and finding malicious activities
  • Inspecting outgoing traffic and applying policies
  • Application vulnerability detection
  • Automating certain aspects of the alerting, investigation and response handling

Monitoring (which is central to most tools) is usually done via proxies, port mirroring, network taps or host-based interface listeners, each having its pros and cons. Enforcement is almost always done via proxies. Bypassing these proxies should not be possible, but for cloud services you can’t really block access if the service is accessed outside your corporate environment (unless the SaaS provider has an IP whitelist feature).

In most cases, even though machine learning/AI is advertised as “the new thing”, tools make decisions based on configured policies (rules). Organizations are drowned in complex policies that they should keep up to date and syncrhonize across tools. Policy management, especially given there’s no indsutry standard for how policies should be defined, is a huge burden. In theory, it gives flexibility and should be there, in practice it may lead to a messy and hard to manage environment.

Monitoring is universally seen as the way to receive actionable intelligence from systems. This is much messier in reality than in demos and often leads to systems being left unmonitored and alerts being ignored. Alert fatigue, which follows from the complexity of policy management, is a bug problem in information security. SOAR is a way to remedy that but it sounds like a band-aid on a broken process rather than a true solution – false alarms should be reduced rather than being closed quasi-automatically. If handling an alert is automatable, then tha tool that generates it should be able to know it’s not a real problem.

The complexity of the security landscape is obviously huge – product categories are defined based on multiple criteria – what problem they solve, how they solve it, or to what extent they solve it. Is a SIEM also a DLP if it uses UEBA to block certain traffic (next-gen SIEMs may be able to invoke blocking actions even if requiring another system to carry it out). Is a DLP a CASB if it does encryption of data that’s stored in cloud services? Should you have an EPP and a SIEM, if the EPP gives you good enough overview of the events being logged in your infrastructure? Is a CASB a WAF for SaaS? Is a SIEM a DAM if it supports native database audit logs? You can’t answer these questions at a category level, you have to look at particular products and how well they implement a certain feature.

Can you have a unified proxy (THE proxy) that monitors everything incoming and outgoing and collects that data, acting as WAF, DLP, SIEM, CASB, SEG? Can you have just one agent that is both a EDR, and a DLP? Well, certainly categories like SASE and UTM go in that direction, trying to ease the decision making process.

I think it’s most important to start from the attack targets, rather than from the means to get there or from the means to prevent getting there. Unfortunately, enterprise security is often driven by “I need to have this type of product”. This leads to semi-abandoned and partially configured tools for which organizations pay millions. Because there is never enough people to be able to go into the intricate details of yet another security soluion, and organizations rely on consultants to set things up.

I don’t have solutions to the problems stated above, but I hope I’ve given a good overview of the landscape. And I think we should focus less on “security products” and more on “security techniques” and on people that can implement them. You don’t have a billion dollar corporation to sell you a silver bullet (which you can’t fire). You need traind experts. That’s hard. There aren’t enough of them. And the security team is often undervalued in the enterprise. Yes, cybersecurity is very important, but I’m not sure whether this will ever get enough visibility and be prioritized over purely business goals. And maybe it shouldn’t, if risk is properly calculated.

All the products above are ways to buy some feeling of security. If used properly and in the right combination, it can be more than a feeling. But too often a feeling is just good enough.

The post Making Sense of the Information Security Landscape appeared first on Bozho's tech blog.

Encryption Overview [Webinar]

Post Syndicated from Bozho original https://techblog.bozho.net/encryption-overview-webinar/

“Encryption” has turned into a buzzword, especially after privacy standards and regulation vaguely mention it and vendors rush to provide “encryption”. But what does it mean in practice? I did a webinar (hosted by my company, LogSentinel) to explain the various aspects and pitfalls of encryption.

You can register to watch the webinar here, or view it embedded below:

And here are the slides:

Of course, encryption is a huge topic, worth a whole course, rather than just a webinar, but I hope I’m providing good starting points. The interesting technique that we employ in our company is “searchable encryption” which allows to have encrypted data and still search in it. There are many more very nice (and sometimes niche) applications of encryption and cryptography in general, as Bruce Schneier mentions in his recent interview. These applications can solve very specific problems with information security and privacy that we face today. We only need to make them mainstream or at least increase awareness.

The post Encryption Overview [Webinar] appeared first on Bozho's tech blog.

Seven Legacy Integration Patterns

Post Syndicated from Bozho original https://techblog.bozho.net/seven-legacy-integration-patterns/

If we have to integrate two (or more) systems nowadays, we know – we either use an API or, more rarely, some message queue.

Unfortunately, many systems in the world do not support API integration. And many more a being created as we speak, that don’t have APIs. So when you inevitably have to integrate with them, you are left with imperfect choices to make. Below are seven patterns to integrate with legacy systems (or not-so-legacy systems that are built in legacy ways).

Initially I wanted to use “bad integration patterns”. But if you don’t have other options, they are not bad – they are inevitable. What’s bad is that fact that so many systems continue to be built without integration in mind.

I’ve seen all of these, on more than one occasion. And I’ve heard many more stories about them. Unfortunately, they are not the exception (fortunately, they are also not the rule, at least not anymore).

  1. Files on FTP – one application uploads files (XML, CSV, other) to an FTP (or other shared resources) and the other ones reads them via a scheduled job, parses them and optionally spits a response – either in the same FTP, or via email. Sharing files like that is certainly not ideal in terms of integration – you don’t get real-time status of your request, and other aspects are trickier to get right – versioning, high availability, authentication, security, traceability (audit trail).
  2. Shared database – two applications sharing the same database may sound like a recipe for disaster, but it’s not uncommon to see it in the wild. If you are lucky, one application will be read-only. But breaking changes to the database structure and security concerns are major issues. You can only use this type of integration is you expose your database directly to another application, which you normally don’t want to do.
  3. Full daily dump – instead of sharing an active database, some organizations do a full dump of their data every day or week and provide to to the other party for import. Obvious data privacy issues exist with that, as it’s a bad idea to have full dumps of your data flying around (in some cases on DVDs or portable HDDs), in addition to everything mention below – versioning, authentication, etc.
  4. Scraping – when an app has no API, it’s still possible to extract data from it or push data to it – via the user interface. With web applications that’s easier, as they “speak” HTML and HTTP. With desktop apps, screen scraping has emerged as an option. The so-called RPA software (Robotic process automation) relies on all types of scraping to integrate legacy systems. It’s very fragile and requires complicated (and sometimes expensive) tooling to get right. Not to mention the security aspect, which requires storing credentials in non-hashed form somewhere in order to let the scraper login.
  5. Email – when the sending or receiving system don’t support other forms of integration, email comes as a last resort. If you can trigger something by connecting a mailbox or if an email is produced after some event happens in the sending system, this may be all you need to integrate. Obviously, email is a very bad means of integration – it’s unstructured, it can fail for many reasons, and it’s just not meant for software integration. You can attach structured data, if you want to get extra inventive, but if you can get both ends to support the same format, you can probably get them extended to support proper APIs.
  6. Adapters – you can develop a custom module that has access to the underlying database, but exposes a proper API. That’s an almost acceptable solution, as you can have a properly written (sort-of) microservice independent of the original application and other system won’t know they are integrating with a legacy piece of software. It’s tricky to get it right in some cases, however, as you have to understand well the state space of the database. Read-only is easy, writing is much harder or next to impossible.
  7. Paper – no, I’m not making this up. There are cases where one organizations prints some data and then the other organization (or department) receives the paper documents (by mail or otherwise) and inputs them in their system. Expensive projects exist out there that aim to remove the paper component and introduce actual integration, as paper-based input is error-prone and slow. The few legitimate scenarios for a paper-based step is when you need an extra security and the paper trail, combined with the fact that paper is effectively airgapped, may give you that. But even then it shouldn’t be the only transport layer.

If you need to do any of the above, it’s usually because at least one of the system is stuck and can’t be upgraded. It’s either too legacy to touch, or the vendor is gone, or adding an API is “not on their roadmap” and would be too expensive.

If you are building a system, always provide an API. Some other system will have to integrate with it, sooner or later. It’s not sustainable to build close systems and delay the integration question for when it’s needed. Assume it’s always needed.

Fancy ESBs may be able to patch things quickly with one of the approaches above and integrate the “unintegratable”, but heavy reliance on an ESB is an indicator of too many legacy or low-quality systems.

But simply having an API doesn’t cut it either. If you don’t support versioning and backward-compatible APIs, you’ll be in an even more fragile state, as you’ll be breaking existing integrations as you progress.

Enterprise integration is tricky. But, as with many things in software, it’s best handled in the applications that we build. If we build them right, things are much easier. Otherwise, organizations have to revert to the legacy approaches mentioned above and introduce complexity, fragility, security and privacy risks and a general feeling of low-quality that has to be supported by increasingly unhappy people.

The post Seven Legacy Integration Patterns appeared first on Bozho's tech blog.

Low-Code, Rapid Application Development and Digital Transformation

Post Syndicated from Bozho original https://techblog.bozho.net/low-code-rapid-application-development-and-digital-transformation/

Recently, many low-code/no-code solutions have gained speed in the enterprise, giving non-technical people the option to create simple applications. Analysts predict that the low-code industry will grow by 20+% each year. But what is low-code, why is it getting so popular and what are the issues with it?

Low-code is something that we’ve occasionally seen in the past decades – a drag-and-drop UI designer that allows you to create simple applications without coding skills. Products have matured and practically all offer similar features – the ability to design entity relationships in drag-and-drop entity-relationship diagrams, the ability to design UI via WYSIWYG, to design simple processes via BPMN-like notations, to call external services via importing web service definitions, to choose from pre-baked entity definitions and to fetch and store data in external databases and spreadsheets.

There are many tools in this domain – MS PowerApps, OutSystems, Appian, Mendix, Google’s recently acquired AppSheets, Ninox, WaveMaker, and many more. And they may differ slightly in their approach and in their feature set, but the whole point is to be able to easily create applications (web-based, mobile-based, hybrid) that solve some immediate pain that these users have, where going through a full-blown IT project with the associated development cost is an overkill.

And that sounds great – you don’t have to rely on the overly busy and not often too responsive IT department in your non-IT company, you just build something yourself to optimize your immediate problem and to digitize your paper processes. And it can do that. To be honest, I like the idea of such tools existing, because a large portion of digital transformation is not handled well by huge, centralized systems and years long development projects. A lot of agility is required in a landscape where there’s more and more demand for digital transformation with expensive developers with limited availability. We have to admit that professional developers can’t be everywhere and solve every problem that could be solved by information technology. And so such tools democratize digital transformation, allowing non-technical people to create software.

Or at least that’s the theory. In practice this is challenging in many ways:

  • Some technical knowledge required – it’s cool to be able to draw an entity relationship diagrams, but first you need to know what a data model is. It’s nice to be able to import a web service and be able to call it, but you have to know what a web service is. Integrating user directories implies you know what LDAP/AD is. I’m not sure non-technical people are able to make real use of these capabilities. Some tools still require simple code even to open a new dialog, and you have to copy-paste that code from somewhere.
  • Integration with on-premise systems – building something useful almost always requires integrating with an existing system. It’s fine to assume everything is in “the cloud”, but even the cloud can be seen as on-premise from the perspective of a 3rd party SaaS. Many tools that I’ve seen integrate with databases by supplying IP, username and password – something that is almost never the case and is a bad idea. The ability to connect to something within the orgnaization’s infrastructure (even if that infrastructure is on Azure and you are using MS PowerApps) means permissions, networks rules configuration, accounts, approvals. And you have to know what to request in order to get it.
  • Vendor lock-in – most low-code tools use proprietary formats for meta-descriptions (some go so far as to store binary representation of their metadata in a local sqlite database, for example). Some providers allow you to run the application on your own cloud by installing their server-side application, but some are purely SaaS. Once you build an application with one tool, you can’t really switch to another. Nice exceptions are WaveMaker and Skyve which generate actual Java code which you can download at any time. Is that lock-in bad? Well, yes – if it happens that you need a feature that’s not yet available, or an integration that’s not yet there, you are stuck.
  • Shadow IT – all IT in an organization is assumed to be managed and observed by the IT department. In terms of monitoring, security, compliance, data protection, lifecycle management, technical support, etc. With low-code these applications can exist without the IT department even knowing about them and that poses many risks (which I’ll discuss below)
  • Sustainability – what if a low-code company goes out of business, or gets acquired by someone who decides to sunset a product or a set of features? What happens when the employee that created the low-code app leaves and there’s nobody that knows how to “program” with the selected tool in order to support the app? What happens when the low-code app becomes legacy itself? Because of the vendor-lock in and lack of any standardization, it’s a big risk to take in terms of sustainability. Sure, you’ll solve an immediate problem somehow, but you may create many more down the line.
  • Security – on the one hand, using a PaaS/SaaS may be perceived as coming with built-in security. On the other hand, non-technical people can’t assess the security of a given platform. And security can’t be fully “built-in” – you have to make sure that authentication is required, that apps are not visible outside some whitelisted office locations, that data can’t be exported by unauthorized people, that you have protection against XSS, CSRF, SQLi and whatnot. Even if the platform gives these options, a non-technical person doesn’t know they have to take care of them in the first place.
  • Compliance – many industries are regulated and there are horizontal regulations like the data protection ones (GDPR, CCPA). Data breaches often happen because data was fetched from its original well-protected storage and kept in various places (like a low-code app) that the data protection officer doesn’t know about. Encryption, anonymization, data minimization, retention periods – most low-code solutions don’t support these things out of the box and it’s unlikely that an employee not familiar with the specifics will walk the extra mile to have the app compliant.
  • Bugs outside of your control – when you have a bug in your software, you can fix it. If it’s in a library, you can patch it. If it’s in a set of tools of a third-party platform, you can’t do anything. During my testing of several low-code solutions I stumbled upon many bugs and inconsistencies.

Some of these problems exist in regular projects as well. Developers may not know how to make an application GDPR-compliant, security is too often overlooked in many projects, and technologies are sometimes selected without taking sustainability into account. But these problems are aggravated by low-code solutions.

The interesting thing is that low-code is one part of a broader spectrum of technologies. They start with “Rapid application development” (RAD) tools and frameworks and end with “no-code” (a.k.a. less feature-rich no-code alternatives). Code-generation tools like Spring Roo are RAD, OpenXava is a RAD framework. Some low-code tools can actually be seen as RAD tools – the aforementioned WaveMaker can be used pretty easily by dev teams to quickly deliver simpler projects without sacrificing too much control (and I guess it is, being acquired by an software development company).

Then there’s RPA – “robotic process automation”, which I’ll bluntly simplify as “low-code with screen scraping” – you automate some processes that involve legacy systems with which you can only extract information and perform actions by having “bots” press buttons on screens. RPA lies slightly outside the rapid application development spectrum, but it brings one important point – there are RPA developers. People that are not deeply technical and aren’t as qualified as developers, but who can still automate processes using the RPA tools. Same goes for low-code and some flavors of RAD – it is assumed that you don’t have to be technical to develop something, but actually there can be (and there are) dedicated experts that can build stuff with these tools. They are not actual developers, but if you can deliver a project with 5 low-code developers and one actual developer, this can dramatically cut costs.

Developers and large companies certainly have a “toolbox” which they reuse across projects – snippets from here and there, occasionally some library or microservice. But each new project involves a lot of boilerplate that even developers need to get rid of. I’m certainly not a big fan of code generation, but RAD has some merit that we have to explore it in order to improve efficiency without sacrificing quality. And to be able to provide the simple tools that would otherwise be built with sub-optimal low-code approaches.

Blurring the line between “developer” and “non-developer” is already happening, though. And while focusing on our fancy frameworks, we shouldn’t lose sight on the tools for the less technically skilled; if only because there’s a risk we’ll have to support them in the long run. And because they will become part of a software ecosystem with which our software will have to interact.

Whether that’s the right and sustainable approach to digital transformation, I don’t know. I’d prefer, of course, everyone to be able to code at least simple things with the help of advanced RAD tools. And we’ll probably get there in a few decades. But as digital transformation needs to happen now, we may have to work with what we have and try to make it more secure, compliant, with less vendor lock-in and more visibility for the IT departments. Otherwise we risk creating an even bigger mess than we are at now.

The post Low-Code, Rapid Application Development and Digital Transformation appeared first on Bozho's tech blog.

Am I a Real Expert?

Post Syndicated from Bozho original https://techblog.bozho.net/am-i-a-real-expert/

The other day I had a conversation with a scientist friend who said something alone the lines of “yes, I work in that general field, but I’m not an expert in your question in particular”. IT is not science, of course, but I asked myself whether I am a real expert in the things that I do. And while it’s nearly impossible to hit exactly the fine line between impostor syndrome and boasting, this post is neither and has a point, so bear with me.

I’ve been doing a lot of things in the general IT field – from general purpose software engineering, IT architecture, information security, applications of cryptography, blockchain, e-government, algorithmic music composition, data analysis. And I’ve seen myself as having relatively expert knowledge. I even occasionally give TV and radio interviews, where I’m labelled as “Expert in X”. But…

  • Am I a real expert in software engineering and software architecture? I’ve been doing that for 15+ years, and I follow and somethings define or clarify best practices, I’m familiar with different methodologies and have been part of teams that implemented some of them correctly and efficiently. I have taken part in the decision making process of building large systems with their architectural implications. But I’ve never been formally assigned as an “architect” (not that I insist), my UML skills are rather basic and I’ve never had to integrate dozens of legacy systems. I’ve never used formal methods for assessing software, I’ve made mistakes in selecting technologies, I’ve never done proper TDD and I have only a basic understanding of networking. Maybe just the sheer amount of years of experience positions me as an expert, maybe the variety of projects I’ve worked on.
  • Am I a real expert in cryptography? Almost certainly not. Yes, I’m using cryptographic building blocks regularly, I know what an initialization vector is and I’ve code reviewed a merkle tree implementation. I’ve read dozens of papers on cryptography and understood many of them. But some papers are greek to me – I have no clue about the math behind cryptography. Sure, RSA is easy, but I have just a basic understanding of how elliptic curves work. On the other hand, I probably know more than 99% of the software industry, where the average person barely differentiates symmetric and asymmetric cryptography, IV is a roman numeral, and cryptography boils down to disabling TLS 1.0 on a web server.
  • Am I a real expert in information security? I’ve given talks on it, I’m in the infosec business, I know and follow best security practices, I know about sqlmap and I’ve even used Wireshark; I understand DEFCON talks and I’ve even decompiled several apps to find (and report) security vulnerabilities. But I’m no Mr. Robot-level hacker, nor I’m a CISO in a large organization who has to plan and implement security measures on hundreds of systems. I haven’t been part of red-teaming exercises and I haven’t built or operated a security operations center (SOC). But maybe in an industry where even having heard of OWASP puts you in the top 10% and actively thinking about the security aspects of each new piece of code puts you in the top 1%, I’m an expert.
  • Am I a real blockchain expert? I know Bitcoin’s and Ethereum’s implementations, I have implemented something similar to bitcoin’s data structures, I know what a Patricia merkle tree is and I’ve built and pushed raw Ethereum transactions. But I’m no Vitalik Buterin, I can’t build something like Ethereum, I’m only vaguely familiar with distributed consensus algorithms and their pitfalls, and I haven’t written a smart contract more complicated than a tutorial example. I haven’t run a production deployment of Hyperledger (only a test one), and I largely ignore most of the new networks. You may say that one doesn’t need to be Vitalik or Satoshi to be an expert, and with most people seeing blockchain as “that thing that stores data in an unmodifiable way”, one could be an expert by just writing a Hello world smart contract.
  • Am I a real e-government expert? Sure, I’ve been an e-government advisor to a deputy prime minister, I’ve co-authored legislation and strategic documents and understand how and why e-government works in several EU countries, most notably Estonia, but do I have a holistic view? I have almost no idea of how the e-government is structured in South Korea, Singapore or UAE, for example, I haven’t written a single paper, and I haven’t measured the impact of legislative, organizational and technical measures that we proposed and applied. There are questions that I don’t know the answer to – e.g. how to make the pan-European eID framework actually work.

So the question is – what does it mean to be an expert anyway? We will always be somewhere on an “expert spectrum”. And in many cases our industry doesn’t apply even basic good practices, so even basic expertise can be very valuable. There are always people that know more than you on a given sub-sub-field, and there are always people that are better than you at most of the things that you do. The reputation of “expert” is something important, yet something so vague. Individually it’s good to know where one stands (Dunning-Kruger and everything), and to be aware of the limits of one’s knowledge and understanding. Knowing the things that you don’t know is a good start.

But in a broader context, who’s an expert? Imagine that after we recover from the COVID-19 crisis, there’s a cyber crisis. Who will be the IT experts to advise governments on the measures to be taken? University professors? Senior silicon valley technical people? Who will be on TV to discuss the cyber crisis in the role of “expert” – a senior engineer at a big bank, a junior developer, or someone that took CS 101 in university and happens to know the host? Who will drive the agenda and public opinion?

The level of our expertise is primary for our careers, but it also has other aspects outside of our immediate field. When a crisis hits (and even if it doesn’t) it’s important that we have real experts, that we listen to them and that we trust them. But also to realize no expert knows everything about everything, and that many questions don’t have absolute answers, even for experts. That knowledge decays if not utilized and “published a paper 30 years ago” may be irrelevant today.

I promised that the article would have a point. And it’s two-fold. First, make sure you know what you don’t know, so that you can explore it if needed. Second, we need to value expertise with its imperfections. There is no absolute expert in anything, there are only relative experts. And forgive me for going through my skills (or lack thereof), but that was the best way I could think of for illustrating my point in depth.

Finally, I hope there isn’t a global IT-related crisis. But as some consider it inevitable, we may think about the perception of expertise in our field and who can we trust with certain aspects. There is no “full stack” expert, as the field is too broad.

The post Am I a Real Expert? appeared first on Bozho's tech blog.

An AWS Elasticsearch Post-Mortem

Post Syndicated from Bozho original https://techblog.bozho.net/aws-elasticsearch-post-mortem/

So it happened that we had a production issue on the SaaS version of LogSentinel – our Elasticsearch stopped indexing new data. There was no data loss, as elasticsearch is just a secondary storage, but it caused some issues for our customers (they could not see the real-time data on their dashboards). Below is a post-mortem analysis – what happened, why it happened, how we handled it and how we can prevent it.

Let me start with a background of how the system operates – we accept audit trail entries (logs) through a RESTful API (or syslog), and push them to a Kafka topic. Then the Kafka topic is consumed to store the data in the primary storage (Cassandra) and index it for better visualization and analysis in Elasticsearch. The managed AWS Elasticsearch service was chosen because it saves you all the overhead of cluster management, and as a startup we want to minimize our infrastructure management efforts. That’s a blessing and a curse, as we’ll see below.

We have alerting enabled on many elements, including the Elasticsearch storage space and the number of application errors in the log files. This allows us to respond quickly to issues. So the “high number of application errors” alarm triggered. Indexing was blocked due to FORBIDDEN/8/index write. We have a system call that enables it, so I tried to run it, but after less than a minute it was blocked again. This meant that our Kafka consumers failed to process the messages, which is fine, as we have a sufficient message retention period in Kafka, so no data can be lost.

I investigated the possible reasons for such a block. And there are two, according to Amazon – increased JVM memory pressure and low disk space. I checked the metrics and everything looked okay – JVM memory pressure was barely reaching 70% (and 75% is the threshold), and there was more than 200GiB free storage. There was only one WARN in the elasticsearch application logs (it was “node failure”, but after that there were no issues reported)

There was another strange aspect of the issue – there were twice as many nodes as configured. This usually happens during upgrades, as AWS is using blue/green deployment for Elasticsearch, but we haven’t done any upgrade recently. These additional nodes usually go away after a short period of time (after the redeployment/upgrade is ready), but they wouldn’t go away in this case.

Being unable to SSH into the actual machine, being unable to unblock the indexing through Elasticsearch means, and being unable to shut down or restart the nodes, I raised a ticket with support. And after a few ours and a few exchanged messages, the problem was clear and resolved.

The main reason for the issue is 2-fold. First, we had a configuration that didn’t reflect the cluster status – we had assumed a bit more nodes and our shared and replica configuration meant we have unassigned replicas (more on shards and replicas here and here). The best practice is to have nodes > number of replicas, so that each node gets one replica (plus the main shard). Having unassigned shard replicas is not bad per se, and there are legitimate cases for it. Our can probably be seen as misconfiguration, but not one with immediate negative effects. We chose those settings in part because it’s not possible to change some settings in AWS after a cluster is created. And opening and closing indexes is not supported.

The second issue is AWS Elasticsearch logic for calculating free storage in their circuit breaker that blocks indexing. So even though there were 200+ GiB free space on each of the existing nodes, AWS Elasticsearch thought we were out of space and blocked indexing. There was no way for us to see that, as we only see the available storage, not what AWS thinks is available. So, the calculation gets the total number of shards+replicas and multiplies it by the per-shard storage. Which means unassigned replicas that do not take actual space are calculated as if they take up space. That logic is counterintuitive (if not plain wrong), and there is hardly a way to predict it.

This logic appears to be triggered when blue/green deployment occurs – so in normal operation the actual remaining storage space is checked, but during upgrades, the shard-based check is triggered. That has blocked the entire cluster. But what triggered the blue/green deployment process?

We occasionally need access to Kibana, and because of our strict security rules it is not accessible to anyone by default. So we temporarily change the access policy to allow access from our office IP(s). This change is not expected to trigger a new deployment, and has never lead to that. AWS documentation, however, states:

In most cases, the following operations do not cause blue/green deployments: Changing access policy, Changing the automated snapshot hour, If your domain has dedicated master nodes, changing data instance count.
There are some exceptions. For example, if you haven’t reconfigured your domain since the launch of three Availability Zone support, Amazon ES might perform a one-time blue/green deployment to redistribute your dedicated master nodes across Availability Zones.

There are other exceptions, apparently, and one of them happened to us. That lead to the blue/green deployment, which in turn, because of our flawed configuration, triggered the index block based on the odd logic to assume unassigned replicas as taking up storage space.

How we fixed it – we recreated the index with fewer replicas and started a reindex (it takes data from the primary source and indexes it in batches). That reduced the size taken and AWS manually intervened to “unstuck” the blue/green deployment. Once the problem was known, the fix was easy (and we have to recreate the index anyway due to other index configuration changes). It’s appropriate to (once again) say how good AWS support is, in both fixing the issue and communicating it.

As I said in the beginning, this did not mean there’s data loss because we have Kafka keep the messages for a sufficient amount of time. However, once the index was writable, we expected the consumer to continue from the last successful message – we have specifically written transactional behaviour that committed the offsets only after successful storing in the primary storage and successful indexing. Unfortunately, the kafka client we are using had auto-commit turned on that we have overlooked. So the consumer has skipped past the failed messages. They are still in Kafka and we are processing them with a separate tool, but that showed us that our assumption was wrong and the fact that the code calls “commit” doesn’t actually mean something.

So, the morals of the story:

  • Monitor everything. Bad things happen, it’s good to learn about them quickly.
  • Check your production configuration and make sure it’s adequate to the current needs. Be it replicas, JVM sizes, disk space, number of retries, auto-scaling rules, etc.
  • Be careful with managed cloud services. They save a lot of effort but also take control away from you. And they may have issues for which your only choice is contacting support.
  • If providing managed services, make sure you show enough information about potential edge cases. An error console, an activity console, or something, that would allow the customer to know what is happening.
  • Validate your assumptions about default settings of your libraries. (Ideally, libraries should warn you if you are doing something not expected in the current state of configuration)
  • Make sure your application is fault-tolerant, i.e. that failure in one component doesn’t stop the world and doesn’t lead to data loss.

To sum it up, a rare event unexpectedly triggered a blue/green deployment, where a combination of flawed configuration and flawed free space calculation resulted in an unwritable cluster. Fortunately, no data is lost and at least I learned something.

The post An AWS Elasticsearch Post-Mortem appeared first on Bozho's tech blog.

Where Is This Coming From?

Post Syndicated from Bozho original https://techblog.bozho.net/where-is-this-coming-from/

In enterprise software the top one question you have to answer as a developer almost every day is “Where is this coming from?”. When trying to fix bugs, when developing new features, when refactoring. You have to be able to trace the code flow and to figure out where a certain value is coming from.

And the bigger the codebase is, the more complicated it is to figure out where something (some value or combination of values) is coming from. In theory it’s either from the user interface or from the database, but we all know it’s always more complicated. I learned that very early in my career when I had to navigate a huge telecom codebase in order to implement features and fix bugs that were in total a few dozens line of code.

Answering the question means navigating the code easily, debugging and tracing changes to the values passed around. And while that seems obvious, it isn’t so obvious in the grand scheme of things.

Frameworks, architectures, languages, coding styles and IDEs that obscure the answer to the question “where is this coming from?” make things much worse – for the individual developer and for the project in general. Let me give a few examples.

Scala, for which I have mixed feelings, gives you a lot of cool features. And some awful ones, like implicits. An implicit is something like a global variable, except there are nested implicit scopes. When you need some of those global variables, so just add the “implicit” keyword and you get the value from the inner-most scope available that matches the type of the parameter you want to set. And in larger projects it’s not trivial to chase where has that implicit value been set. It can take hours of debugging to figure out why something has a particular value, only to figure out some unrelated part of the code has touched the relevant implicits. That makes it really hard to trace where stuff is coming from and therefore is bad for enterprise codebases, at least for me.

Another Scala feature is partially applied functions. You have a function foo(a, b, c) (that’s not the correct syntax, of course). You have one parameter known at some point, and the other two parameters known at a later point. So you can call the function partially and pass the resulting partially applied function to the next function, and so on until you have the other arguments available. So you can do bar(foo(a)) which means that in bar(..) you can call foo(b, c). Of course, at that point, answering the question “where did the value of a come from” is harder to answer. The feature is really cool if used properly (I’ve used it, and was proud about it), but it should be limited to smaller parts of the codebase. If you start tossing partially applied functions all over the place, it becomes a mess. And unfortunately, I’ve seen that as well.

Enough about Scala, the microservices architecture (which I also have mixed feeling about) also complicates the ability of a developer to trace what’s happening. If for a given request you invoke 3-4 external systems, which both return data and manipulate data, it becomes much harder to debug your application. Instead of putting a breakpoint or doing a call hierarchy, you have to track the parameters of each interaction with each microservice. It’s news to nobody that microservices are harder to debug but I just wanted to put that in the context of answering the “where is this coming from” question.

Dynamic typing is another example. I’ve included that as part of my arguments why I prefer static typing. Java IDEs have “Call hierarchy”. Which is the single most useful IDE functionality for large enterprise software (for me even more important than the refactoring functionality). You really can trace every bit of possible code flow, not only in your codebase, but also in your dependencies, which often hide the important details (chances are, you’ll be putting breakpoints and inspecting 3rd party code rather often). Dynamic typing doesn’t give you the ability to do that properly. doSomething called on an unknown-at-compile-time type can be any method with that name. And tracing where stuff is coming from becomes much harder.

Code generation is something that I’ve always avoided. It takes input from text files (in whatever language they are) and generates code, turning the question “where is this coming from” to “why has this been generated that way”.

Message queues and async programming in general – message passing obscures the source and destination of a given piece of data; a message queue adds complexity to the communication between modules. With microservices you at least have API calls, with queues, you have multiple abstractions between the sender and recipient (exchanges, topics, queues). And that’s a general drawback of asynchrounous programming – that there’s something in between the program flow that does “async magic” and spits something on the other end – but is it transformed, is it delayed, is it lost and retried, is it still waiting?

By all these examples I’m not saying you should not use message queues, code generation, dynamic languages, microservices or Scala (though for some I’d really advice against). All of these things have their strengths, and they have been chosen exactly for those strengths. A message queue was probably chosen because you want to really decouple producer and consumer. Scala was chosen for its expressiveness. Microservices were chosen because a monolith had become really hard to manage with multiple teams and multiple languages.

But we should try to minimize the “damage” of not being able to easily trace the program flow and not being able to quickly answer “where is this coming from”. Impose a “no-implicits” rule on your scala code base. Use code-generation for simpler components (e.g. DTOs for protobuf). Use message queues with predictable message/queue/topic/exchange names and some slightly verbose debug logging. Make sure your microservices have corresponding SDKs with consistent naming and that they can be run locally without additional effort to ease debugging.

It is expected that the bigger and more complex a project is, the harder it will be to trace where stuff is going. But do try to make it as easy as possible, even if it costs a little extra effort in the design and coding phase. You’ll be designing and writing that feature for a week. And you (and others) will be supporting and expanding it for the next 10 years.

The post Where Is This Coming From? appeared first on Bozho's tech blog.

One-Month of Microsoft DKIM Failure and Thoughts on Technical Excellence

Post Syndicated from Bozho original https://techblog.bozho.net/one-month-of-microsoft-dkim-failure-and-thoughts-on-technical-excellence/

Last month I published an article about getting email settings right. I had recently configured DKIM on our company domain and was happy to share the experience. However, the next day I started receiving error DMARC reports saying that our Office365-backed emails are failing DKIM which can mean they are going in spam.

That was quite surprising as I have followed every step in the Microsoft guide. Let me first share a bit about that as well, as it will contribute to my final point.

In order to enable DKIM, on a normal email provider, you just have to get the DKIM selector value from some settings screen and copy them in your DNS admin dashboard (in a new ._domainkey. record). Microsoft, instead, make you do the following:

  • Connect to Exchange Powershell
  • Oh, you have 2FA enabled? Sorry, that doesn’t work – follow this tutorial instead
  • Oh, that fails when downloading their custom application? An obscure stackoverflow answer tells you that you should not download it with Firefox or Chrome – it fails to run then. You should download it with Internet Explorer (or Edge) and then it works. Just Microsoft.
  • Now that you are connected, follow the DKIM guide
  • In addition to powershell, you should also go in the Exchange admin in your Office365 admin, and enable DKIM

Yes, you can imagine you can waste some time here. But it finally works, and you set the CNAME records and assume that what they wrote – namely, that the TXT records to which the CNAME records point, are generated on Microsoft’s end (for onmicrosoft.com), and so anyone trying to validate a signature will pick the right public key from there and everything will be perfect. Well, no. Here’s an excerpt from a test mail I sent to my Gmail in December:

DKIM: ‘FAIL’ with domain logsentinel.com
ARC-Authentication-Results: i=2; mx.google.com;
dkim=temperror (no key for signature) [email protected] header.s=selector1

I immediately reported the issue in a way that I think is quite clear, although succint:

Hello, I configured DKIM following your tutorial (https://docs.microsoft.com/en-us/microsoft-365/security/office-365-security/use-dkim-to-validate-outbound-email), however the TXT records at your end that are supposed to be generated are not generated – so me setting a CNAME records points to a missing TXT record. I tried disabling and reenabling, and rotating the keys, but the TXT record is still missing, which means my emails get in spam folders, because the DMARC policy epxects a DKIM record. Can you please fix that and create the required TXT records?

What followed was 35 days of long phone calls and repetitive emails with me trying to explain what the issue is and Microsoft support not getting it. As a sidenote, Office365 support center doesn’t support tickets with so much communication – it doesn’t have pagination so I can’t see the original note online (only in my inbox).

On the 5th minute of my first-of-several 20 minute calls my wife, who was around, already knew exactly what the issue is (she’s quite technical, yes), but somehow Microsoft support didn’t get it. We went through multiple requests for me providing DMARC reports (and just the gmail report wasn’t enough, it had to be from 2 others as well), screenshots of my DNS administration screen, screenshots to prove the record is missing (but mxtoolbox is some third party tool they can’t trust), even me sharing my computer for a remote desktop session (no way, company policy doesn’t allow random semi-competent people to touch my machine). I ended up providing an nslookup (a Microsoft tool) screenshot, to show that the record is missing.

In the meantime I tried key rotation again form the exchange admin UI. That got stuck for a week, so I raised a separate ticket. While we were communication on that ticket as well, rotation finished, but the keys didn’t change. What changed was (that I later figured) the active selector. We’ll get to that later.

After the initial few days I executed Get-DkimSigningConfig | fl to get the DKIM record values and just created TXT records with them (instead of CNAME), so that our mails don’t get in spam. Of course, having DKIM fail doesn’t automatically mean you’ll get in spam, even though our DMARC policy says so, but it’s a risk one shouldn’t take. With the TXT records set everything was working okay, so I didn’t have an urgent problem anymore. That allowed me to continue slowly trying to resolve the issue with Microsoft support, for another month.

Two weeks ago they acknowledged something is missing on their end. Miracle! So the last few days of the horror were about me having to set the CNAME records back (and get rid of my TXT records) so that they can fix the missing records on their end. I know, that sounds ridiculous – you don’t need a CNAME to point to your TXT in order to get that TXT to appear, but who knows what obscure and shitty programs and procedures they have, so I complied. What followed was a request for more screenshots and more DKIM reports.

No. You have nslookup. No screenshot of mine is better than aн nslookup. The TTL for my entries is 300s, so there should be no issue with caching (and screenshots won’t help with that either).

Then finally, after having spoken to or emailed probably 5-6 different people, there was an email that made sense:

Good Afternoon,

As per the public documentation you followed (https://docs.microsoft.com/en-us/microsoft-365/security/office-365-security/use-dkim-to-validate-outbound-email) it is mentioned to have TXT records created in Microsoft DNS server for both Selector CNAME Records. we agree with that.

Our product engineer team has confirmed that there is a design level change was rolled out and it is still not published in public documentation. To be precise, we don’t require TXT record for both CNAME Records published. The TXT record will be published only for the Active Selector.

LastChecked 2020-01-01 07:40:58Z
KeyCreationTime 2019-12-28 18:37:38Z
RotateOnDate 2020-01-01 18:37:38Z
SelectorBeforeRotateOnDate selector1
SelectorAfterRotateOnDate selector2

Above info confirms that the active selector is “Selector2” and we have the TXT record published. So we don’t have to wait for the TXT record for Selector1 in our scenario.

Also we require the Selector1 CNAME record in your DNS, whenever next rotate happen. Microsoft will publish the other TXT record. This shouldn’t affect your environment at any chance.

Alright. So, when I triggered key rotation, I fixed the results of the original bug that lead to no records present. The rotation made the active selector “selector2” and generated its TXT record. Selector1 is currently missing, but it’s inactive, so that is not an issue. What “active selector” means is an interesting question. It’s nowhere in the documentation, but there’s this article from 2016 that sheds some light. Microsoft has to swap the selectors on each rotation. According to the article rotation happens every week, but that’s no longer the case – since my manual rotation there hasn’t been any automatic one.

So to summarize – there was an initial bug that happened who knows why, then my stuck-for-a-week manually triggered rotation fixed it, but because they haven’t documented the actual process, there was no way for me to know that the missing TXT record is fine and will (hopefully) be generated on the next rotation (which is my current pending question for them – when will that happen so that I check if it worked).

To highlight the issues here. First, a bug. Second, missing/outdated documentation. Third, incompetent (though friendly) support. Fourth, a convoluted procedure to activate a basic feature. And that’s for not getting emails in spam in an email product.

I can expand those issues to any Microsoft cloud offering. We’ve been doing some testing on Azure and it’s a mess. At first an account didn’t even work. No reason, just fails. I had to use another account. The UI is buggy. Provisioning resources takes ages with no estimate. Documentation is far from perfect. Tools are meh.

And yet, Microsoft is growing its cloud business. Allegedly, they are the top cloud provider in the world. they aren’t, obviously, but through bundling Office365 and Azure in the same reporting, they appear bigger than AWS. Azure is most likely smaller than AWS. But still, Satya Nadella has brought Microsoft to market success again. Because technical excellence doesn’t matter in enterprise sales. You have your sales reps, nice conferences, probably some kickbacks, (alleged) easy integration with legacy Windows stuff that matters for most enterprise customers. And for the decision maker that chooses Microsoft over Amazon or Google the argument of occasional bugs, poor documentation or bad support is irrelevant.

But it is relevant for the end results. For the engineers and the products they build. Unfortunately, tech companies have prioritized alleged business value over technical excellence so much that they can no longer deliver the business value. If the outgoing email of an entire organization goes in spam because Microsoft has a bug in Office365 DKIM, where’s the business value of having a managed email provider? If the people-hours saved form going from on-prem to Azure are wasted because of poor technology, where’s the business value? Probably decision makers don’t factor that in, but they should.

I feel technical excellence has been in decline even in big tech companies, let alone smaller ones. And while that has lead to projects gone out of budged, poor quality and poor security, the fact that the systems aren’t that critical has made it possible for the tech sector to get away with that. And even grow.

But let me tell you of big player in a different industry that stopped caring about technical excellence. Boeing. This article tells the story of shifting Boeing from an engineering organization to one that cuts on quality expenses and outsources core capabilities. That lead to the death of people. Their plane crashed because of that shift in priorities. That lead to a steep decline in sales, and a moderate decline in stock and a bailout.

If you de-prioritize technical excellence, and you are in a critical sector, you lose. If you are an IT vendor, you can get away with it, for a while. Maybe forever? I hope not, otherwise everything in tech will be broken for a long time. And digital transformation will be a painful process around bugs and poor tools.

The post One-Month of Microsoft DKIM Failure and Thoughts on Technical Excellence appeared first on Bozho's tech blog.

One-Time Passwords Do Not Provide Non-Repudiation

Post Syndicated from Bozho original https://techblog.bozho.net/one-time-passwords-do-not-provide-non-repudiation/

The title is obvious and it could’ve been a tweet rather than a blogpost. But let me expand.

OTP, or one-time password, used to be mainstream with online banking. You get a dedicated device that generates a 6-digit code which you enter into your online banking in order to login or confirm a transaction. The device (token) is airgapped (with no internet connection), and has a preinstalled secret key that cannot be leaked. This key is symmetric and is used to (depending on the algorithm used) encrypt the current time, strip most of the ciphertext and turn the rest into 6 digits. The server owns the same secret key, does the same operation and compares the resulting 6 digits with the ones provided. If you want a more precise description of (T)OTP – check wikipedia.

Non-repudiation is a property of, in this case, a cryptographic scheme, that means the author of a message cannot deny the authorship. In the banking scenario, this means you cannot dispute that it was indeed you who entered those 6 digits (because, allegedly, only you possess the secret key because only you possess the token).

Hardware tokens are going out of fashion and are being replaced by mobile apps that are much easier to use and still represent a hardware device. There is one major difference, though – the phone is connected to the internet, which introduces additional security risks. But if we try to ignore them, then it’s fine to have the secret key on a smartphone, especially if it supports secure per-app storage, not accessible by 3rd party apps, or even hardware security module with encryption support, which the latest ones do.

Software “tokens” (mobile apps) often rely on OTPs as well, even though they don’t display the code to the user. This makes little sense, as OTPs are short precisely because people need to enter them. If you send an OTP via a RESTful API, why not just send the whole ciphertext? Or better – do a digital signature on a server-provided challenge.

But that’s not the biggest problem with OTPs. They don’t give the bank (or whoever decides to use them as a means to confirm transactions) non-repudiation. Because OTPs rely on a shared secret. This means the bank, and any (curious) admin knows the shared secret. Especially if the shared secret is dynamically generated based on some transaction-related data, as I’ve seen in some cases. A bank data breach or a malicious insider can easily generate the same 6 digits as you, with your secure token (dedicated hardware or mobile).

Of course, there are exceptions. Apparently there’s ITU-T X.1156 – a non-repudiation framework for OTPs. It involves trusted 3rd parties, though, which is unlikely a bank scenario. There are also HSMs (server-side hardware security modules) that can store secret keys without any option to leak them to the outside world that support TOTP natively. So the device generates the next 6 digits. Someone with access to the HSM can still generate such a sequence, but hopefully audit trail would indicate that this happened, and it is far less of an issue than just a database breach. Obviously (or, hopefully?) secrets are not stored in plaintext even outside of HSMs, but chances are they are wrapped with a master key in an HSM. Which means that someone can bulk-decrypt them and leak them.

And this “can” is very important. No matter how unlikely it is, due to internal security policies and whatnot, there is no cryptographic non-repudation for OTPs. A customer can easily deny entering the code to confirm a transaction. Whether this will hold in court is a complicated legal issue, where the processes in the bank can be audited to see if a breach is possible, whether any of the admins has a motive to abuse their privilege and so on, whether there’s logs of the user activities that can be linked to their device based on network logs from internet carriers, and so on. So you can achieve some limited level of non-repudiation through proper procedures. But it is not much different than authorizing the transaction with your password (at least it is not a shared secret, hopefully).

But why bother with all of that if you have a mobile app with internet connection? You can just create a digital signature on a unique per-transaction challenge and you get proper non-repudiation. The phone can generate a private key on enrollment and send its public key to the server, which can later be used to verify the signature. You don’t have the bandwidth limitation of the human brain that lead to the requirement of 6 digits, so the signature length can be arbitrarily long (even GPRS can handle digital signatures).

Using the right tool for the job doesn’t have just technical implications, it also has legal and business implications. If you can achieve non-repudiation with standard public-key cryptography, if your customers have phones with secure hardware modules and you are doing online mobile-app-based authentication and authorization anyway, just drop the OTP.
.

The post One-Time Passwords Do Not Provide Non-Repudiation appeared first on Bozho's tech blog.

Solving Problems Properly Is Often Not Viable

Post Syndicated from Bozho original https://techblog.bozho.net/solving-problems-properly-is-often-not-viable/

How many times you, as a software expert, saw some software or process and thought “damn, this can be done so much better”. Yes, a lot. But why, since large organizations spend a lot of money on IT? Is it because software is too complex, is it because of organizational issues, is it legacy software, or just the way things are?

And if what we see is so bad, isn’t common market sense saying that surely these companies will be disrupted and made obsolete by a new and better competitor? I won’t give a definitive answer why software is bad (I have already blamed developers, though I admit, developers are only a part of the problem). But I’ll try to reason about what it’s not getting replaced under market forces.

Peter Thiel in his book “From Zero to One” says that a a new product has to be at least 10 times better in order to be disruptive. Everything below that is just gradual improvement over the status quo. Google was 10 times better than AltaVista, Lycos and Yahoo as a way to find information on the web. Skype was 10 times better than whatever was out there before that. But most of the things that we see as broken don’t get fixed because there’s no 10x improvement that would kill them. Let’s take a loot at a few examples.

Banking. Banking is horrible. Online banking UX is usually on par with a legacy ERP, bank transfers rely on a patchwork of national specifics and ages old international standards like SWIFT. I can’t really think of anyone who really likes their online banking. Mobile banking is even worse, as, of course, it’s tedious to copy-paste IBANs on a phone. Banks are, in many cases, still running COBOL on mainframes, and it’s really hard and expensive to get something changed or fixed. Why they are still around? Because the experience of the online banking is of marginal importance to the bank’s bottom line.

Yes, Revolut and TransferWise are cool tech billion-dollar-valuation startups that are “disrupting” banking. But not really. They are just fancier banks. Yes, finally you can do something on your mobile and you can be onboarded without going to a physical office and support is better and the UI is sleek, and the UX make sense. Do banks care? Not much. Because these startups are just small banks that are currently losing money in order to get to more people and make them spend small amounts online. Nothing of that is a 10x improvement in the value provided. It surely is much better technically, but once you have to transfer money elsewhere, you hit SWIFT or ideally SEPA. Once you want to pay in store, you hit Visa or MasterCard. Nothing of the legacy infrastructure is replaced, we just have better UX for, in many cases, sending money to those legacy banks.

Digital Identity. Anything online where it’s important to know who’s on the other end, requires digital identity. And yet we are nowhere near solving that problem. There are country-specific solutions where governments and/or consortia of banks issue some form of digital identity which others can trust. But it’s fragmented, with varying levels of security and integration isn’t necessarily easy. At the same time we have KYC companies that basically let you scan your customers passport or ID, optionally does a liveness detection (automated or manual) video conference and then check the person against databases of terrorists or sanctioned individuals. So for each service you have to do that again, slightly differently, and if something changes (e.g. an address or name change), there’s not really much you can do.

Ideally, digital identity is federated, using a common standard and easy to integrate, so that enrollment in sensitive online services that require some knowledge about a customer, is straightforward. The identification process gives just as much information as you need. Identity can be managed by multiple entities, government or private, that can vouch that an individual is indeed who they claim to be. The way you identify, with the current technology setting, would be with a key, securely stored in a mobile phone secure storage, using WebAuthn, SAML. There should be a way to re-issue they key in case a phone is lost, and then we’ll have an awesome, reusable digital identity to solve all online authentication and enrollment woes.

Alas, we are nowhere near that. Because what we currently have is working. It’s working terribly, expensively and with a lot of hacks and patches, and most importantly – users have to enrollment for every service separately, but from the point of view of each individual company, it’s easier to just get some service for passport verification and then issue username and password, with optional 2FA, and you are done. The system I described above is not 10 times better than the status quo. A blockchain-based self-sovereign identity, by the way, is also not 10 times better. And it has usability issues that even make it inferior to the status quo.

Privacy. Literally every day there’s a huge data breach or a huge privacy issue (mostly with Facebook). Facebook has been giving our data to 3rd parties, because why not. Companies are collecting whatever they can get ahold of, without much care of protecting it. Sensitive personal data is still stored unencrypted, unpseudonymized, in SQL databases with barely tracked admin access; applications still export bulks of sensitive data to excel sheets sent over email or published to public S3 buckets. Passwords are still stored in plaintext or unsalted, data is communicated over unencrypted connections.

We know how to fix all that; there are many best practices to address all of that. How many companies encrypt data at rest, which is the simplest and most obvious thing to do? How many encrypt data with separate keys to protect bulk exports? How many use pseudonymization when doing exports? Knowledge and best practices are out there but our software is still built as a CRUD application over a SQL datastore and the most important task is to get it done quickly, so let’s not bother with privacy protections.

And, from a business point of view, rightly so, unfortunately. Check the share price of a few recently breached companies. It drops for a few days and then bounces back up. And at the same time no privacy-preserving technology is 10 times better than what we have today, no company will be 10 times better if it protects personal data. So why bother?

Security. In the long run cybersecurity is very important. In the short term, it isn’t. Nothing bad will happen. The other day I inspected the 2FA application of a bank. It’s full of mistakes and yet, through multiple hoops and security-through-obscurity, it reduces the risk of something very bad happening. And if it happens, well, insurance will cover the losses. Data breaches happen not only because we don’t care about personal data, but also because we don’t care about security. Chief infomration security officers find it hard to convince boards that their role is needed, let alone that it needs budget. Security tools are a patchwork of barely working integrations. There was recently a series of rants from one infosec professional that rightly points out that our infosec tools are bad.

Are things improving? A little bit, mostly by introducing more secure protocols. Backward compatibility slows down the improvements, of course, but we’re getting there. But is there a security “fix” that makes things 10 times better, that changes the game, that cuts costs or risk so much? Nope.

These observations go for many more areas, both horizontal and vertical – social networks (where Facebook can’t even get sharing right, let alone data protection), public sector software is mostly still in the 90s, even online payments have not moved since PayPal at the beginning of the century – we still take our credit card from our wallets and type numbers and CVC codes, with all the associated fraud.

In many cases when there’s no visible improvement for years, some proactive governmental structure like the European Commission decides to step in and write some piece of legislation that tries to fix things. For banks, for example, there’s the PSD2 (2nd payment services directive), which mandates open banking – all data about a bank account should be accessible via APIs to third parties who can manage it, display it properly, analyze it. Wonderful in theory, a mess in practice so far – APIs barely work, there’s no standardization and anyone who wants to offer a service ontop of open banking APIs is suffering at the moment. SEPA, the single euro payment area standard, was introduced by a directive as well, more than a decade ago.

Regarding digital identity, there’s the eIDAS regulation which defines how governments should offer their eID cross border, so that, in theory, anyone can use any government or otherwise issued electronic identity to identify for any service across the EU. Things are not there yet at all, and the architecture is so complicated, I don’t think it will work as intended. The fragmented eID space will remain fragmented for all practical purposes.

For privacy there’s, of course, GDPR. And while it resulted in more people trying to take care of personal data, it lead to more documents being written than lines of code changed. Same for information security – the NIS directive tried to improve the security of providers of substantial services. It’s often unclear, though, what is a substantial service to begin with. And then it’s mostly organizational measures. Countries like the UK and Germany have good guidelines and frameworks to improve security but we are yet to see the results.

Why is solving problems properly so hard? Legacy software, legacy standard and legacy thinking certainly doesn’t help. But in many of these domains, solving problems properly does not bring sufficient business value. And we are not prepared to do things properly – while the knowledge on how to do things better exists, it’s not mainstream. And it’s not mainstream, because it’s not perceived as required.

We have half-assed our technology because it kinda works sufficiently to support the bottom line and to make users happy. We have muddled through the myriad of issues with the minimum effort required. Because even the maximum effort would not have been a sufficient improvement to change the game. Even the perfect re-imagined banking, digital identity, social network would not be significantly different for the end user. Sure, a little cheaper and a little more convenient, but not groundbreaking. Not to mention hidden horizontal aspects like security and privacy.

And we will continue that way, pushed by standards and regulations when nothing else helps, to a messy gradual improvement. But it won’t be worth the investment to do things right – why would you invest millions in quality software development when it will marginally affect your business? It’s only worth to get things to barely work.

The post Solving Problems Properly Is Often Not Viable appeared first on Bozho's tech blog.

A Technical Guide to CCPA

Post Syndicated from Bozho original https://techblog.bozho.net/a-technical-guide-to-ccpa/

CCPA, or the California Consumer Privacy Act, is the upcoming “small GDPR” that is applied for all companies that have users from California (i.e. it has extraterritorial application). It is not as massive as GDPR, but you may want to follow its general recommendations.

A few years ago I wrote a technical GDPR guide. Now I’d like to do the same with CCPA. GDPR is much more prescriptive on the fact that you should protect users’ data, whereas CCPA seems to be mainly concerned with the rights of the users – to be informed, to opt out of having their data sold, and to be forgotten. That focus is mainly because other laws in California and the US have provisions about protecting confidentiality of data and data breaches; in that regard GDPR is a more holistic piece of legislation, whereas CCPA covers mostly the aspect of users’ rights (or “consumers”, which is the term used in CCPA). I’ll use “user” as it’s the term more often use in technical discussions.

I’ll list below some important points from CCPA – this is not an exhaustive list of requirements to a software system, but aims to highlight some important bits. And, obviously, I’m not a lawyer, but I’ve been doing data protection consultations and products (like SentinelDB) for the past several years, so I’m qualified to talk about the technical side of privacy regulations.

  • Right of access – you should be able to export (in a human-readable format, and preferable in machine-readable as well) all the data that you have collected about an individual. Their account details, their orders, their preferences, their posts and comments, etc.
  • Deletion – you should delete any data you hold about the user. Exceptions apply, of course, including data used for prevention of fraud, other legal reasons, needed for debugging, necessary to complete the business requirement, or anything that the user can reasonably expect. From a technical perspective, this means you most likely have to delete what’s in your database, but other places where you have personal data, like logs or analytics, can be skipped (provided you don’t use it to reconstruct user profiles, of course)
  • Notify 3rd party providers that received data from you – when data deletion is requested, you have to somehow send notifications to wherever you’ve sent personal data. This can be a SaaS like Mailchimp, Salesforce or Hubspot, or it can be someone you sold the data (apparently that’s a major thing in CCPA). So ideally you should know where data has been sent and invoke APIs for forgetting it. Fortunately, most of these companies are already compliant with GDPR anyway, so they have these endpoints exposed. You just have to add the logic. If your company sells data by posting dumps to S3 or sending Excel sheets via email, you have a bigger problem as you have to keep track of those activities and send unstructured requests (e.g. emails).
  • Data lineage – this is not spelled out as a requirement, but it follows from multiple articles, including the one for deletion as well as the one for disclosing who data was sent to and where did data came from in your system (in order to know if you can re-sell it, among other things). In order to avoid buying expensive data lineage solutions, you can either have a spreadsheet (in case of simpler processes), or come up with a meaningful way to tag your data. For example, using a separate table with columns (ID, table, sourceType, sourceId, sourceDetails), where ID and table identify a record of personal data in your database, sourceType is the way you have ingested the data (e.g. API call, S3, email) and the ID is the identifier that you can use to track how it came in your system – API key, S3 bucket name, email “from”, or even company registration ID (data might still be sent around flash drives, I guess). Similar table for the outgoing data (with targetType and targetId). It’s a simplified implementation but it might work in cases where a spreadsheet would be too cumbersome to take care of.
  • Age restriction – if you’ve had the opportunity to know the age of a person whose data you have, you should check it. That means not to ignore the age or data of birth field when you import data from 3rd parties, and also to politely ask users about their age. You can’t sell that data, so you need to know which records are automatically opted out. If you never ever sell data, well, it’s still a good idea to keep it (per GDPR)
  • Don’t discriminate if users have used their privacy rights – that’s more of a business requirement, but as technical people we should know that we are not allowed to have logic based on users having used their CCPA (or GDPR) rights. From a data organization perspective, I’d put rights requests in a separate database than the actual data to make it harder to fulfill such requirements. You can’t just do a SQL query to check if someone should get a better price, you should do cross system integration and that might dissuade product owners from breaking the law; furthermore it will be a good sign in case of audits.
  • “Do Not Sell My Personal Information” – this should be on the homepage if you have to comply with CCPA. It’s a bit of a harsh requirement, but it should take users to a form where they can opt out of having their data sold. As mentioned in a previous point, this could be a different system to hold users’ CCPA preferences. It might be easier to just have a set of columns in the users’ table, of course.
  • Identifying users is an important aspect. CCPA speaks about “verifiable requests”. So if someone drops you an email “I want my data deleted”, you should be able to confirm it’s really them. In an online system that can be a button in the user profile (for opting out, for deletion, or for data access) – if they know the password, it’s fairly certain it’s them. However, in some cases, users don’t have accounts in the system. In that case there should be other ways to identify them. SSN sounds like one, and although it’s a terrible things to use for authentication, with the lack of universal digital identity, especially in the US, it’s hard not to use it at least as part of the identifying information. But it can’t be the only thing – it’s not a password, it’s an identifier. So users sharing their SSN (if you have it), their phone or address, passport or driving license might be some data points to collect for identifying them. Note that once you collect that data, you can’t use it for other purposes, even if you are tempted to. CCPA requires also a toll-free phone support, which is hardly applicable to non-US companies even though they have customers in California, but it poses the question of identifying people online based on real-world data rather than account credentials. And please don’t ask users about their passwords over the phone; just initiate a request on their behalf in the system and direct them to login and confirm it. There should be additional guidelines for identifying users as per 1798.185(a)(7).
  • Deidentification and aggregate consumer information – aggregated information, e.g. statistics, is not personal data, unless you are able to extract personal data based on it (e.g. the statistics is split per town and age and you have only two users in a given town, you can easily see who is who). Aggregated data is differentiate from deidentified data, which is data that has its identifiers removed. Simply removing identifiers, though, might again not be sufficient to deidentify data – based on several other data points, like IP address (+ logs), physical address (+ snail mail history), phone (+ phone book), one can be uniquely identified. If you can’t reasonably identify a person based on a set of data, it can be considered deidentified. Do make the mental exercise of thinking how to deidentify your data, as then it’s much easier to share it (or sell it) to third parties. Probably nobody minds being part of an aggregated statistics sold to someone, or an anonymized account used for trend analysis.
  • Pseudonymization is a measure to be taken in many scenarios to protect data. CCPA mentions it particularly in research context, but I’d support a generic pseudonymization functionality. That means replacing the identifying information with a pseudonym, that’s not reversible unless a secret piece of data is used. Think of it (and you can do that quite literally) as encrypting the identifier(s) with a secret key to form the pseudonym. You can then give that data to third parties to work with it (e.g. to do market segmentation) and then give it back to you. You can then decrypt the pseudonyms and fill the obtained market segment(s) into your own database. The 3rd party doesn’t get personal information, but you still get the relevant data
  • Audit trail is not explicitly stated as a requirement, but since you have the obligation to handle users requests and track the use of their data in and outside of your system, it’s a good idea to have a form of audit trail – who did what with which data; who handled a particular user request; how was the user identified in order to perform the request, etc.

As CCPA is not concerned with data confidentiality requirements, I won’t repeat my GDPR advice about using encryption whenever possible (notably, for backups), or about internal security measures for authentication.

CCPA is focused on the rights of your users and you should be able to handle them (and track how you handled them). You can have manual and spreadsheet based processes if you are not too big, and you should definitely check with your legal team if and to what extent CCPA applies to your company. But if you have implemented the GDPR data subject rights, it’s likely that you are already compliant with CCPA in terms of the overall system architecture, except for a few minor details.

The post A Technical Guide to CCPA appeared first on Bozho's tech blog.

Blockchain Overview – Types, Use-Cases, Security and Usability [slides]

Post Syndicated from Bozho original https://techblog.bozho.net/blockchain-overview-types-use-cases-security-and-usability-slides/

This week I have a talk on a meetup about blockchain beyond the hype – its actual implementation issues and proper use-cases.

The slides can be found here:

The main takeaways are:

  • Think of blockchain in specifics, not in high-level “magic”
  • Tamper-evident data structures are cool, you should be familiar with them – merkle trees, hash chains, etc. They are useful for other things as well, e.g. certificate transparency
  • Blockchain and its cryptography is perfect for protecting data integrity, which is part of the CIA triad of information security
  • Many proposed use-cases can be solved with centralized solutions + trusted timestamps instead
  • Usability is a major issue when it comes to wider adoption

As with anything in technology – use the right tool for the job, as no solution solves every problem.

The post Blockchain Overview – Types, Use-Cases, Security and Usability [slides] appeared first on Bozho's tech blog.