Tag Archives: Opinions

Idea: A Generic P2P Network Client

Post Syndicated from Bozho original https://techblog.bozho.net/idea-a-generic-p2p-network-client/

Every now and then one has a half-baked idea about some project that they aren’t likely to be able to do because of lack of time. I’ve written about such random app ideas before, but they were mostly about small apps.

Here I’d like to share an idea for something a bit bigger (and therefor harder to spare time for) – a generic P2P network client. P2P networks are popular in various domains, most notably file sharing and cryptocurrencies. However, in theory they can be applied to many more problems, social networks, search engines, ride sharing, distributed AI, etc. All of these examples have been implemented in p2p context, and they even work okay, but they lack popularity.

The popularity is actually the biggest issue with these applications – in order to get a service to be popular, in many cases you need a network effect – a p2p file sharing with 100 users doesn’t benefit from being p2p. A social network with 100 users is, well, useless. And it is hard to get traction with these p2p services because they require an additional step – installing software. You can’t just open a webpage and register, you have to install some custom software that will be used to join the p2p network.

P2P networks are distributed, i.e. there is no central node that has control over what happens. That control is held over the binary that gets installed – and it’s usually open source. And you need that binary in order to establish an overlay network. These networks reuse the internet’s transport layer, but do not rely on the world wide web standards, and most importantly, don’t rely heavily on DNS (except, they actually do when run for the first time in order to find a few known seed nodes). So once you are connected to the network, you don’t need to make HTTP or DNS queries, everything stays in the specifics of the particular protocol (e.g. bittorrent).

But the fact that not only you have to trust and install some piece of software, you have to be part of the network and exchange data regularly with peers. So it really doesn’t scale if you want to be part of dozens of p2p networks – each of them might be hungry for resources, you have to keep dozens of applications running all the time (and launching on startup).

So here’s the idea – why don’t we have a generic p2p client. A software that establishes the link to other peers and is agnostic on what data is going to be transferred. From what I’ve seen, the p2p layer is pretty similar in different products – you try to find peers in your immediate network, if none are found, you connect to a known seed node (first by DNS which uses DNS round-robin and then by a list of hardcoded IPs), and when you connect the seed node gives you a list of peers to connect to. Each of those peers has knowledge of other peers, so you can quickly be connected to a significant number of peer nodes (I’m obviously simplifying the flow, but that’s roughly how it works).

Once you have an established list of peers, you start doing the application-specific stuff – sharing files, downloading a cryptocurrency ledger, sharing search indexes, sharing a social network profile database, etc. But the p2p network part can be, I think, generalized.

So we can have that generic client and make it pluggable – every application developer can write their own application ontop of it. The client will not only be a single point for the user, but can also manage resources automatically – inbound and outbound traffic, CPU/GPU usage (e.g. in case of cryptocurrency mining). And all of these applications (i.e. plugins) can either be installed by downloading them from the vendor’s website (which is somewhat similar to the original problem), or can be downloaded from a marketplace available within the client itself. That would obviously mean a centralized marketplace, unless the marketplace itself is a p2p application that’s built-in the client.

So, effectively, you’d be able to plug-in your next file sharing solution, you next cryptocurrency, encrypted messaging, or your next distributed social network. And your users won’t have the barrier of installing yet another desktop app. That alone does not solve the network effect, as you still need enough users to add your plugin to their client (and for many to have the client to begin with), but it certainly makes it easier for them. Imagine if we didn’t have Android and Apple app stores and we had to find relevant apps by other means.

This generic client can possibly be even a browser plugin, so that it’s always on when you are online. It doesn’t have to be, but it might ease adoption. And while it will be complicated to write it as a plugin, it’s certainly possible – there are already p2p solutions working as browser plugins.

Actually, many people think of blockchain smart contracts as a way to do exactly that – have distributed applications. But they have an unnecessary limitation – they work on data that’s shared on a blockchain. And in some cases you don’t need that. You don’t need consensus in the cryptocurrency sense. For example in file sharing, all you need to do is compute the hash of the file (and of its chunks) and start sending them to interested peers. No need to store the file on the blockchain. Same with instant messaging – you don’t need to store the message on a shared immutable database, you only need to send it to the recipients. So smart contracts are not as generic solution as what I’m proposing.

Whether a generic client can accommodate an unlimited amount of different protocols and use cases, how would the communication protocol look like, what programming languages it should support and what security implications that has for the client (e.g. what’s the sandbox that the client provides), what UI markup will be used, are all important operational details, but are besides the point of this post.

You may wonder whether there isn’t anything similar done already. Well, I couldn’t find one. But there’s a lot done that can support such a project: Telehash (a mesh protocol), JXTA (a p2p protocol) and its Chaupal implementation, libp2p and Chimera (p2p libraries), Kademlia (a distributed hash table).

Whether such a project is feasible – certainly. Whether its adoption is inevitable – not so certainly, as it would require immediate usefulness in order to be installed in the first place. So, as every “platform” it will face a chicken-and-egg problem – will people install it if there are no useful plugins, and will people write plugins if there are no user installations. That is solvable in a number of ways (e.g. paying developers initially to write plugins, bundling some standard applications (e.g. file sharing and instant messaging). It could be a business opportunity (monetized through the marketplace or subscriptions) as well as a community project.

I’m just sharing the idea, hoping that someone with more time and more knowledge of distributed networks might pick it up and make the best of it. If not, well, it’s always nice to think about what can the future of the internet look like. Centralization is inevitable, so I don’t see p2p getting rid of centralized services anytime soon (or ever), but some things arguably work better and safer when truly decentralized.

The post Idea: A Generic P2P Network Client appeared first on Bozho's tech blog.

First Thoughts About Facebook’s Libra Cryptocurrency

Post Syndicated from Bozho original https://techblog.bozho.net/first-thoughts-about-facebooks-libra-cryptocurrency/

Facebook announced today that by 2020 they will roll out Libra – their blockchain-based cryptocurrency. It is, of course, major news, as it has the potential to disrupt the payment and banking sector. If you want to read all the surrounding newsworthy details, you can read the TechCrunch article. I will instead focus on a few observations and thoughts about Libra – from a few perspectives – technical, legal/compliance, and possibly financial.

First, replacing banks and bank transfers and credits cards and payment providers and ATMs with just your smartphone sounds appealing. Why hasn’t anyone tried to do that so far – well, many have tried, but you can’t just have the technology and move towards gradual adoption. You can’t even do it if you are Facebook. You can, however, do it, if you are Facebook, backed by Visa, Mastercard, Uber, and many, many more big names on the market. So Facebook got that right – they made a huge coalition that can drive such a drastic change forward.

I have several reservations, though. And I’ll go through them one by one.

  • There is not much completed – there is a website and a technical paper and an open-source prototype. It’s not anywhere near production. The authors of the paper write that the Move language is still being designed (and it’s not that existing move language). The opensource prototype is still a prototype (and one that’s a bit hard to read, thought that might be because of the choice of Rust). This means they will work with tight schedules to make this global payment system operational. The technical paper is a bit confusing hard to follow. Maybe they rushed it out and didn’t have time to polish it, and maybe it’s just a beginning of a series of papers. Or maybe it’s my headache and the paper is fine.
  • The throughput sounds insufficient – the authors of the paper claim that they don’t have any performance results yet, but they expect to support around 1000 transactions per second. This section of the paper assumes that state channels will be used and most transactions won’t happen on-chain, but that’s a questionable assumption, as the use of state channels is not universally applicable. And 1000 TPS is too low compared to the current Visa + mastercard transactions (estimated to be around 5000 per second). Add bank transfers, western union, payment providers and you get a much higher number. If from the start you are presumably capped at 1000, would it really scale to become a global payment network? Optimizing throughput is not something that just happens – other cryptocurrencies have struggled with that for years.
  • Single merkle tree – I’m curious whether that will work well in practice. A global merkle tree that is being concurrently and deterministically updated is not an easy task (ordering matters, timestamps are tricky). Current blockchain implementations use one merkle tree per block and it’s being constructed based on all transactions that fall within that block when the block is “completed”. In Libra there will be no blocks (so what does “blockchain” mean anyway), so the merkle tree will always grow (similar to the certificate transparency log).
  • How to get money in? – Facebook advertises Libra as a way for people without bank accounts to do payments online. But it’s not clear how they will get Libra tokens in the first place. In third world countries, consumers’ interaction is with the telecoms where they get their cheap smartphones and mobile data cards. The realistic way for people to get Libra tokens would be if Facebook had contracts with Telecoms, but how and whether that is going to be arranged, is an open question.
  • Key management – key management has always been, and will probably always be a usability issue for using blockchain by end users. Users don’t want to to manage their keys, and they can’t do that well. Yes, there are key management services and there are seed phrases, but that’s still not as good as getting a centralized account that can be restored if you lose your password. And if you lose your keys, or your password (from which a key is derived to encrypt your keys and store them in the cloud), then you can lose access to your account forever. That’s a real risk and it’s highly undesirable for a payment system
  • Privacy – Facebook claims it won’t associate Libra accounts with facebook accounts in order to target ads. And even if we can trust them, there’s an inherent privacy concern – all payments will be visible to anyone who has access to the ledger. Yes, a person can have multiple accounts, but people will likely only use one. It’s not a matter of what can be done by tech savvy users, but what will be done in reality. And ultimately, someone will have to know who those addresses (public keys) belong to (see the next point). A zcash-basedh system would be privacy preserving, but it can’t be compliant
  • Regulatory compliance – if you are making a global payment system, you’d have to make regulators around the world, especially the SEC, ESMA, ECB. That doesn’t mean you can’t be distruptive, but you have to take into account real world problems, like KYC (know your customer) processes as well as other financial limitations (you can’t print money after all). Implementing KYC means people being able to prove who they are before they can open an account. This isn’t rocket science, but it isn’t trivial either, especially in third world countries. And money launderers are extremely inventive, so if Facebook (or the Libra association) doesn’t get its processes right, it risks becoming very lucrative for criminals and regime officials around the world. Facebook is continuously failing to stop fake news and disinformation on the social network despite the many urges from the public and legislators, I’m not sure we can trust them with being “a global compliance officer”. Yes, partners like Visa and Mastercard can probably help here, but I’m not sure how involved they’ll be.
  • Oligarchy – while it has “blockchain” in the tagline, it’s not a democratizing solution. It’s a private blockchain based on trusted entities that can participate in it. If it’s going to be a global payment network, it will be a de-facto payment oligarchy. How is that worse from what we have now? Well now at least banks are somewhat independent power players. The Libra association is one player with one goal. And then antitrust cases will follow (as they probably will immediately, as Facebook has long banned cryptocurrency ads on the website only to unveil its own)

Facebook has the power to make e-commerce website accept payments in Libra. But will consumers want it. Is it at least 10 times better than using your credit card, or is it marginally better so that nobody is bothered to learn yet another thing (and manage their keys). Will third world countries really be able benefit from such a payment system, or others will prove more practical (e.g. the m-pesa)? Will the technology be good enough or scalable enough? Will regulators allow it, or will Facebook be able to escape their graps with a few large fines until they’ve conquered the market?

I don’t know the answers, but I’m skeptical about changing the money industry that easily. And given Facebook’s many privacy blunders, I’m not sure they will be able to cope well in a much more scrutinized domain. “Move fast and break things” and easily become “Move fast and sponsor terrorism”, and that’s only partly a joke.

I hope we get fast and easy digital payments, as bank transfers and even credit cards feel too outdated. But I’m sure it’s not easy to meet the complex requirements of reality.

The post First Thoughts About Facebook’s Libra Cryptocurrency appeared first on Bozho's tech blog.

Reflection is the most important Java API

Post Syndicated from Bozho original https://techblog.bozho.net/reflection-is-the-most-important-java-api/

The other day I was wondering – which is the most important Java API. Which of the SE and EE APIs is the one that makes most of the Java ecosystem possible and that could not have just been recreated as a 3rd party library.

And as you’ve probably guessed by the title – I think it’s the Reflection API. Yes, it’s inevitably part of every project, directly or indirectly. But that’s true for many more APIs, notably the Collection API. But what’s important about the Reflection API is that it enabled most of the popular tools and frameworks today – Spring, Hibernate, a ton of web frameworks.

Most of the other APIs can be implemented outside of the JDK. The Collections API could very well be commons-collect or guava. It’s better that it’s part of the JDK, but we could’ve managed without it (it appeared in Java 1.2). But the reflection API couldn’t. It had to almost be an integral part of the language.

Without reflection, you could not have any of the fancy tools that we use today. Not ORM, not dependency injection frameworks, and not most of the web frameworks. Well, technically, you could at some point have theme – using SPI, or using java-config only. One may argue that if it wasn’t for reflection, we’d have skipped the whole XML configuration era, and wend directly to code-based configuration. But it’s not just configuration that relies on reflection in all these frameworks. Even if Spring could have its beans instantiated during configuration and initialized by casting them to InitalizingBean, how would you handle the autowired injection without reflection (“manually” doesn’t count, as it’s not autowiring)? In hibernate, the introspection and java bean APIs might seem sufficient, but when you dig deeper, they are not. And handling annotations would not be possible in general.

And without these frameworks, Java would not have been the widespread technology that it is today. If we did not have the huge open-source ecosystem, Java would have been rather niche [citation needed]. Of course, that’s not the only factor – there are multiple things that the language designers and then the JVM implementors got right. But reflection is, I think, one of those things.

Yes, using reflection feels hacky. Reflection in the non-framework code feels like a last resort thing – you only use it if a given library was not properly designed for extension but you need to tweak it a tiny bit to fit your case. But even if you have zero reflection code in your codebase, your project is likely full of it and would not have been possible without it.

The need to use reflection might be seen as one of the deficiencies of the language – you can’t do important stuff with what the language gives you, so you resort to a magic API that gives you unrestricted access to otherwise (allegedly) carefully designed APIs. But I’d say that even having reflection is a de-facto language feature. And it is one that probably played a key role in making Java so popular and widespread.

The post Reflection is the most important Java API appeared first on Bozho's tech blog.

The Positive Side-Effects of Blockchain

Post Syndicated from Bozho original https://techblog.bozho.net/the-positive-side-effects-of-blockchain/

Blockchain is a relatively niche technology at the moment, and even thought there’s a lot of hype, its applicability is limited. I’ve been skeptical about its ability to solve all the world’s problems, as many claim, and would rather focus it on solving particular business issues related to trust.

But I’ve been thinking about the positive side-effects and it might actually be one of the best things that have happened to software recently. I don’t like big claims and this sound like one, but bear with me.

Maybe it won’t find its place in much of the business software out there. Maybe in many cases you don’t need a distributed solution because the business case does not lend itself to one. And certainly you won’t be trading virtual coins in unregulated exchanges.

But because of the hype, now everyone knows the basic concepts and building blocks of blockchain. And they are cryptographic – they are hashes, digital signatures, timestamps, merkle trees, hash chains. Every technical and non-technical person in IT has by now at least read a little bit about blockchain to understand what it is.

So as a side effect, most developers and managers are now trust-conscious, and by extension – security conscious. I know it may sound far-fetched, but before blockchain how many developers and managers knew what a digital signature is? Hashes were somewhat more prevalent mostly because of their (sometimes incorrect) use to store passwords, but the PKI was mostly arcane knowledge.

And yes, we all know how TLS certificates work (although, do we?) and that a private key has to be created and used with them, and probably some had a theoretical understanding of digital signatures. And we knew encryption was kind of a good idea at rest and in transit. But putting that in the context of “trust”, “verifiability” and “non-repudiation” was, in my view, something that few people have done mentally.

And now, even by not using blockchain, developers and managers would have the trust concept lurking somewhere in the back of their mind. And my guess would be that more signatures, more hashes and more trusted timestamps will be used just because someone thought “hey, we can make this less prone to manipulation through this cool cryptography that I was reminded about because of blockchain”.

Blockchain won’t be the new internet, but it already has impact on the mode of thinking of people in the software industry. Or at least I hope so.

The post The Positive Side-Effects of Blockchain appeared first on Bozho's tech blog.

Audit Trail in IT Context

Post Syndicated from Bozho original https://techblog.bozho.net/audit-trail-in-it-context/

An audit trail (or audit log) is something both intuitive and misleading at the same time. There are many definitions of an audit trail, and all of them give you an idea of what it is about:

A system that traces the detailed transactions relating to any item in an accounting record.

A record of the changes that have been made to a database or file.

An audit trail (also called audit log) is a security-relevant chronological record, set of records, and/or destination and source of records that provide documentary evidence of the sequence of activities that have affected at any time a specific operation, procedure, or event.

An audit trail is a step-by-step record by which accounting or trade data can be traced to its source

The definitions are clear, but they rarely give enough detail on how they apply to a particular IT setup. The Wikipedia article is pretty informative, but I prefer referring to the NIST document about audit trail. This relatively short document from more than 20 years ago covers many the necessary details.

I won’t repeat the NIST document, but I’d like to focus on the practical aspects of an audit trail in order to answer the question in the title – what is an audit trail? In the context of a typical IT setup, the audit trail includes all or some of the following:

  • Application-specific audit trail – ideally, each application records business-relevant events. They may be logged in text files or in separate database tables. They allow reconstructing the history much better than the arbitrary noisy logging that is usually in place
  • Application logs – this is a broader category as it includes logs that are not necessarily part of the audit trail (e.g. debug messages, exception stacktraces). Nevertheless, they may be useful, especially in case there is no dedicated application-specific audit trail functionality
  • Database logs – whether it is logged queries, change data capture or change tracking functionality, or some native audit trail functionality
  • Operating system logs – for Linux that would include the /var/log/audit/audit.log (or similar files), /var/log/auth.log. For Windows it would include the Windows Event logs for the Security and System groups.
  • Access logs – access logs for web servers can be part of the audit trail especially for internal systems where a source IP address can more easily be mapped to particular users.
  • Network logs – network equipment (routers, firewalls) generate a lot of data that may be seen as part of the audit trail (although it may be very noisy)

All of these events can (and should) be collected in a central place where they can be searched, correlated and analyzed.

Ideally, an audit trail event has an actor, an action, details/payload and optionally an entity (type + id) which is being accessed or modified. Having this information in a structured form allows for better analysis and forensics later.

Once the audit trails is collected, it has two other very important properties:

  • Availability – is the audit trail available at all times, and how far back in time can it be accessed (also referred to as “retention”). Typically application, system and network logs are kept for shorter periods of time (1 to 3 months), which is far from ideal for an audit trail. Many standards and regulation require higher retention periods of up to 2 years
  • Integritydata integrity is too often ignored. But if you can’t prove the integrity of your logs, both internally and to third parties (auditors, courts), then they are of no use.

Many organizations understand that the integrity of their audit trail is important only after a security incident takes place and they realize they cannot rely on their audit logs. It is better to protect the audit trail before something bad happens. And not just for some abstract reasons like “best practices” or “improved security”. A secure audit trail allows organizations to:

  • Identify that something wrong has happened. If the audit trail is not protected, a malicious actor can make it seem like nothing anomalous has taken place
  • Prove what happened and who did it. Digital forensics is important for large organizations. Especially in cases where lawsuits are involved.
  • Reconstruct the original data. The audit trail is not a replacement for a full backup, but it can allow you to reconstruct the data that was modified and the time it was modified, so that either you know which backup to use, or to avoid restoring form the backup altogether by just relying on the data provided in the audit trail.
  • Be compliant. Because of all the reasons stated above, many standards and regulations require a secure audit trail. Simple logs are usually not compliant, even though auditors may turn a blind eye.

The audit trail is an important concept, and has very practical implications. It is a widely researched topic, but in practice many IT setups lack sufficient audit trail capabilities. Partly because the risks are not immediately obvious, and partly because it is a technically challenging task.

(The article is originally published on our corporate blog, but I think it can be interesting here as well, so I copied most of it, leaving out the corporate references and calls to action)

The post Audit Trail in IT Context appeared first on Bozho's tech blog.

7 Questions To Ask Yourself About Your Code

Post Syndicated from Bozho original https://techblog.bozho.net/7-questions-to-ask-yourself-about-your-code/

I was thinking the other days – why writing good code is so hard? Why the industry still hasn’t got to producing quality software, despite years of efforts, best practices, methodologies, tools. And the answer to those questions is anything but simple. It involves economic incentives, market realities, deadlines, formal education, industry standards, insufficient number of developers on the market, etc. etc.

As an organization, in order to produce quality software, you have to do a lot. Setup processes, get your recruitment right, be able to charge the overhead of quality to your customers, and actually care about that.

But even with all the measures taken, you can’t guarantee quality code. First, because that’s subjective, but second, because it always comes down to the individual developers. And not simply whether they are capable of writing quality software, but also whether they are actually doing it.

And as a developer, you may fit the process and still produce mediocre code. This is why my thoughts took me to the code from the eyes of the developer, but in the context of software as a whole. Tools can automatically catch code styles issues, cyclomatic complexity, large methods, too many method parameters, circular dependencies, etc. etc. But even if you cover those, you are still not guaranteed to have produced quality software.

So I came up with seven questions that we as developers should ask ourselves each time we commit code.

  1. Is it correct? – does the code implement the specification. If there is no clear specification, did you do a sufficient effort to find out the expected behaviour. And is that behaviour tested somehow – by automated tests preferably, or at least by manual testing,.
  2. Is it complete? – does it take care of all the edge cases, regardless of whether they are defined in the spec or not. Many edge cases are technical (broken connections, insufficient memory, changing interfaces, etc.).
  3. Is it secure? – does it prevent misuse, does it follow best security practices, does it validate its input, does it prevent injections, etc. Is it tested to prove that it is secure against these known attacks. Security is much more than code, but the code itself can introduce a lot of vulnerabilities.
  4. Is it readable and maintainable? – does it allow other people to easily read it, follow it and understand it? Does it have proper comments, describing how a certain piece of code fits into the big picture, does it break down code in small, readable units.
  5. Is it extensible? – does it allow being extended with additional use cases, does it use the appropriate design patterns that allow extensibility, is it parameterizable and configurable, does it allow writing new functionality without breaking old one, does it cover a sufficient percentage of the existing functionality with tests so that changes are not “scary”.
  6. Is it efficient? – does work well under high load, does it care about algorithmic complexity (without being prematurely optimized), does it use batch processing, does it read avoid loading big chunks of data in memory at once, does it make proper use of asynchronous processing.
  7. Is it something to be proud of? – does it represent every good practice that your experience has taught you? Not every piece of code is glorious, as most perform mundane tasks, but is the code something to be proud of or something you’d hope nobody sees? Would you be okay to put it on GitHub?

I think we can internalize those questions. Will asking them constantly make a difference? I think so. Would we magically get quality software If every developer asked themselves these questions about their code? Certainly not. But we’d have better code, when combined with existing tools, processes and practices.

Quality software depends on many factors, but developers are one of the most important ones. Bad software is too often our fault, and by asking ourselves the right questions, we can contribute to good software as well.

The post 7 Questions To Ask Yourself About Your Code appeared first on Bozho's tech blog.

Implicit _target=”blank”

Post Syndicated from Bozho original https://techblog.bozho.net/implicit-_targetblank/

The target="_blank" href attributes has been been the subject of many discussions. When is it right to use it, should we use it at all, is it actually deprecated, is it good user experience, does it break user expectations, etc.

And I have a strange proposal for improving the standard behaviour in browsers – implicit target=_blank" in certain contexts. But let’s try to list when target="_blank" is a good idea:

  • On pages with forms when the user may need additional information in order to fill the form but you don’t want them to leave the form and lose their input
  • On homepage-like websites – e.g. twitter and facebook, where your behaviour is “browsing” and opening a link here and there. It may be applied to things like reddit or hacker news, though it’s currently not implemented that way there
  • In comment/review sections where links are user-provided – this is similar to the previous one, as the default behaviour is browsing through multiple comments and possibly following some of them

The typical argument here is that if a user wants to open a new page, they can do that with either the context manu or ctrl+click. Not many users know that feature, and even fewer are using it. And so many of these pages are confusing, and combined with a sometimes broken back-button it becomes a nightmare.

In some of these cases javascript is used to make things better (and more complicated). In the case of forms, javascript is added to warn people against leaving the page with an unfinished form. Javascript is used to turn certain links to target=”_blank” ones. Some people try to open new tabs with javascript.

So my proposal is to introduce a with the following values for open-links and on-form:

  • open-links="new-tab" – open a link tab for each link on the page or in the current div/section/…
  • open-links="new-window" – same as above, but open a new window (almost never a good idea)
  • open-links="new-tab-on-form" – only open a new tab if there a form on the page (possibly another requirement can be that the form is being partially filled)
  • open-links="new-window-on-form" – same as above, but with a new window
  • open-links="warn-on-form" – open the link in the same tab, but if there’s a partially filled form, warn the user they will lose their input

The default value, I think, should be new-tab-on-form.

It might introduce new complexities and may confuse users further. But I think it’s worth trying to fix this important part of the web rather than leaving each website handle it on their own (or forgetting to handle it).

The post Implicit _target=”blank” appeared first on Bozho's tech blog.

How to create secure software? Don’t blink! [talk]

Post Syndicated from Bozho original https://techblog.bozho.net/how-to-create-secure-software-dont-blink-talk/

Last week Acronis (famous for their TrueImage) organized a conference in Sofia about cybersecurity for developers and I was invited to give a talk.

It’s always hard to pick a topic for a talk on a developer conference that is not too narrowly focused — if you choose something too high level, you can be uesless to the audience and seen as a “bullshitter”; if you pick something too specific, half of the audience may be bored because it is not their area of work.

So I chose the middle ground — an overview talk with as much specifics as possible in it. I tried to tell interesting stories of security vulnerabilities to illustrate my points. The talk is split in several parts:

  • Purpose of attacks
  • Front-end vulnerabilities (examples and best practices)
  • Back-end vulnerabilities (examples and best practices)
  • Infrastructure vulnerabilities (examples and best practices)
  • Human factor vulnerabilities (examples and best practices)
  • Thoughts on how this fits into the bigger picture of software security

You can watch the 30 minutes video here:

If you would like to download my slides, click here. or view them at SlideShare:

The point is — security is hard and there are a million things to watch for and a million things that can go wrong. You should minimize risk by knowing and following as much best practices as possible. And you should not assume you are secure, as even the best companies make rookie mistakes.

The security mindset, which is partly formalized by secure coding practices, is at the core of having a secure software. Asking yourself constantly “what could go wrong” will make software more secure. It is a whole other topic of how to make all software more secure, not just the ones we are creating, but it is less technical and goes through the topics public policies, financial incentives to invest in security and so on.

For technical people it’s important to know how to make a focused effort to make a system more secure. And to apply what we know, because we may know a lot and postpone it for “some other sprint”.

And as a person from the audience asked me — is not blinking really the way? Of course not, that effort won’t be justified. But if we cover as much of the risks as possible, that will give us some time to blink.

The post How to create secure software? Don’t blink! [talk] appeared first on Bozho's tech blog.

Blockchain – What Is It Good For? [slides]

Post Syndicated from Bozho original https://techblog.bozho.net/blockchain-what-is-it-good-for-slides/

Last week I gave a 20 minute talk on the way I see blockchain applicability. I’ve always been skeptical of the blockchain hype, having voiced my concerns, my rants and other thoughts on the matter.

I’ve followed actual blockchain projects that didn’t really need blockchain but managed to yield some very good results by digitizing processes, by eliminating human error, and occasionally, by guaranteeing the integrity of data. And recently I read an article that put these observations into perspective – that blockchain is just a tool for digital transformation (a buzzword broadly meaning “doing things on a computer and more efficiently”). That rarely the distributed consensus is needed, let alone public ledgers. But that doesn’t matter, as long as the technology has lead to some processes being digitized and transformed.

So here are the slides from my talk:

And people are usually surprised that I have a blockchain-related company and I’m so skeptical at the same time. But that’s actually logical – I know how the technology works, what problems it solves and how it can be applied in a broad set of domains. And that’s precisely why I don’t think it’s a revolution. It’s a wonderful piece of technological innovation that will no doubt solve some problems much better than they were solved before, but it won’t be the new internet and it won’t change everything.

Doesn’t that skepticism hurt my credibility as a founder of a blockchain-related startup? Not at all – I don’t want to get a project just because of a buzzword – that’s not sustainable anyway. I want to get it because it solves a real problem that the customer has. And to solve it the right way, i.e. with the best technologies available. And blockchain’s underlying mechanisms are a great tool in the toolbox. Just not a revolution.

In order to be revolutionary, something has to bring at least 10 times improvement over existing practices, or make a lot of things possible that weren’t possible before. Blockchain is neither. I got a question from the audience – “well, isn’t it a 10 times innovation in payments?”. My counter-question was: “Have you ever bought something with cryptocurrencies?”. Well, no. It also doesn’t improve 10 times cross-organization integration. Yes, it might help to establish a shared database, but you could’ve done that with existing technology if you needed to.

But if the blockchain hype helped people realize that digital events can be protected, and that stakeholders can exchange data and present proofs to each other that they haven’t modified the data, who cares if the ultimate implementation will be based on Ethereum, Hyperledger, Corda, or just a clever use of digital signatures, timestamps and web services, or perhaps simply merkle trees.

I hope that blockchain gets demystified soon and we all start speaking the same language (so that I don’t need to reassure an audience at a banking summit that – no, we are not doing cryptocurrencies in our blockchain company). Once we get there, we’ll be able to efficiently solve the problems of digital transformation. As for the digital revolution – it is already happening. We are moving everything online. And yes, with centralized services rather than distributed p2p networks, but that’s not a technical issue, it’s a socioeconomic one. And technology by itself is rarely a solution to such problems.

The post Blockchain – What Is It Good For? [slides] appeared first on Bozho's tech blog.

Technical Innovation vs. Process Innovation

Post Syndicated from Bozho original https://techblog.bozho.net/technical-innovation-vs-process-innovation/

We are often talking about “innovation” and “digital innovation” (or “technical innovation”) in particular, when it comes to tech startups. It has, unfortunately, become a cliche, and now “innovation” is devoid of meaning. I’ve been trying to put some meaningful analysis of the “innovation landscape” and to classify what is being called “innovation”.

And the broad classification I got to is “technical innovation” vs “process innovation”. In the majority of cases, tech startups are actually process innovations. They get existing technology and try to optimize some real world process with it. These processes include “communicating with friends online”, “getting in touch with business contacts online”, “getting a taxi online”, “getting a date online”, “ordering food online”, “uploading photos online”, and so on. There is no inherent technical innovation in any of these – they either introduce new (and better) processes, or they optimize existing ones.

And don’t get me wrong – these are all very useful things. In fact, this is what “digital transformation” means – doing things electronically that were previously done in an analogue way, or doing things that were previously not possible in the analogue world. And the better you imagine or reimagine the process, the more effective your company will be.

In many cases these digital transformation tools have to deal with real-world complexities – legislation, entrenched behaviour, edge cases. E.g. you can easily write food delivery software. You get the order, you notify the store, you optimize the delivery people’s routes to collect and deliver as much food as possible, and you’re good to go. And then you “hit” the real world, where there are traffic jams, temporarily closed streets, restricted parking, unreponsive restaurants, unresponsive customers, keeping the online menu and what’s in stock in sync, worsened weather conditions, messed up orders, part-time job regulations that differ by country, and so on. And you have to navigate that maze in order to deliver a digitally transformed food delivery service.

There is nothing technically complex about that – any kid with some PHP and JS knowledge can write the software by finding answers to the programming hurdles on Stackoverflow. In that sense, it is no so technically innovative. The hard part is the processes and the real-world complexities. And of course, turning that into a profitable business.

In the long run, these non-technical innovations end up producing technical innovation. Facebook had nothing interesting on the technical side in the beginning. Then it had millions of users and had to scale, and then it became interesting – how to process so much data, how to scale to multiple parts of the world, how to optimize the storage of so many photos, and so on. Facebook gave us Cassandra, Twitter gave us Snowflake, LinkedIn gave us Kafka. There are many more examples, and it’s great that these companies open source some of their internally developed technologies. But these are side-effects of the scale, not an inherent technical innovation that lead to the scale in the first place.

And then there’s the technical innovation companies. I think it’s a much more rare phenomenon and the prime example is Google – the company started as a consequence of a research paper. Roughly speaking, the paper outlined a technical innovation in search that made all other approaches to search obsolete. We can say that Bitcoin was such an innovation, as well. In some cases it’s not the founders that develop the original research, but they derive their product from existing computer science research. They combine multiple papers, adapt them to the needs of the real world (because, as we know, research papers often rely on a “spherical horse in vacuum”) and build something useful.

As a personal side-note here, some of my (side) projects were purely process innovations – I once made an online bus ticket reservation service (before such a thing existed in my country), then I made a social network aggregator (that was arguably better than existing ones at the time). And they were much less interesting than my more technically innovative projects, like Computoser (which has some original research) or LogSentinel (which combines several research papers into a product).

A subset of the technical innovation is the so called “deep tech” – projects that are supposed to enable future innovation. This can be simplified as “applied research”. Computer vision, AI, biomedical. This is where you need a lot of R&D, not simply “pouring” code for a few months.

Just as “process innovation” companies eventually lead to technical innovation, technical innovation companies eventually (or immediately) lead to process improvements. Google practically changed the way we consume information, so it’s impact on the processes is rather high. And to me, that’s the goal of each company – to get to change behaviour. It’s much more interesting to do that using never-before-done technical feats, but if you can do it without the technical bits (i.e. by simply building a website/app using current web/mobile frameworks), good for you.

If you become a successful company, you’ll necessarily have both types of innovation, regardless of how you started. And in order to have a successful company, you have to improve processes and change behaviour. You have to do digital transformation. In the long run, it doesn’t make that much of a difference which was first – the technology or the process innovation. Although from business and investment perspective, it’s easier for competitors to copy the processes and harder to copy internal R&D.

Whether we should call process innovation “technical innovation” – I don’t think so, but that ship has already sailed – anything that uses technology is now “technical innovation”, even if it’s a WordPress site. But for technical people it’s much more challenging and rewarding to be based on actual technical innovation. We just have to remember that we have to solve real-world problems, improve or introduce processes and change behaviour.

The post Technical Innovation vs. Process Innovation appeared first on Bozho's tech blog.

Resources on Distributed Hash Tables

Post Syndicated from Bozho original https://techblog.bozho.net/resources-on-distributed-hash-tables/

Distributed p2p technologies have always been fascinating to me. Bittorrent is cool not because you can download pirated content for free, but because it’s an amazing piece of technology.

At some point I read and researched a lot about how DHTs (distributed hash tables) work. DHTs are not part of the original bittorrent protocol, but after trackers were increasingly under threat to be closed for copyright infringment, “trackerless” features were added to the protocol. A DHT is distributed among all peers and holds information about which peer holds what data. Once you are connected to a peer, you can query it for their knowledge on who has what.

During my research (which was with no particular purpose) I took a note on many resources that I thought useful for understanding how DHTs work and possibly implementing something ontop of them in the future. In fact, a DHT is a “shared database”, “just like” a blockchain. You can’t trust it as much, but proving digital events does not require a blockchain anyway. My point here is – there is a lot more cool stuff to distributed / p2p systems than blockchain. And maybe way more practical stuff.

It’s important to note that the DHT used in BitTorrent is Kademlia. You’ll see a lot about it below.

Anyway, the point of this post is to share the resources that I collected. For my own reference and for everyone who wants to start somewhere on the topic of DHTs.

I hope the list is interesting and useful. It’s not trivial to think of other uses of DHTs, but simply knowing about them and how they work is a good thing.

The post Resources on Distributed Hash Tables appeared first on Bozho's tech blog.

Algorithmic and Technological Transparency

Post Syndicated from Bozho original https://techblog.bozho.net/algorithmic-and-technological-transparency/

Today I had a talk on OpenFest about algorithmic and technological transparency. It was a somewhat abstract and high-level talk but I hoped to make it more meaningful than if some random guy just spoke about techno-dystopias.

I don’t really like these kinds of talks where someone who has no idea what a “training set” is, how dirty the data is and how to write five lines in Python talks about an AI revolution. But I think being a technical person let’s me put some details into the high-level stuff and make it less useless.

The problem I’m trying to address is opaque systems – we don’t know how recommendation systems work, why are seeing certain videos or ads, how decisions are taken, what happens with our data and how secure the systems we use are, including our cars and future self-driving cars.

Algorithms have real impact on society – recommendation engines make conspiracy theories mainstream, motivate fascists, create echo chambers. Algorithms decide whether we get loans, welfare, a conviction, or even if we get hit by a car (as in the classical trolley problem).

How do we make the systems more transparent, though? There are many approaches, some overlapping. Textual description of how things operate, tracking the actions of back-office admins and making them provable to third parties, implementing “Why am I seeing this?”, tracking each step in machine learning algorithms, publishing datasets, publishing source code (as I’ve previously discussed open-sourcing some aspects of self-driving cars).

We don’t have to reveal trade secrets and lose competitive advantage in order to be transparent. Transparency doesn’t have to come at the expense of profit. In some cases it can even improve the perception of a company.

I think we should start with defining best practices then move to industry codes and standards, and if that doesn’t work, carefully think of focused regulation (but have in mind politicians are mostly clueless about technology and may do more damage)

The slides from the talk are below (the talk itself was not in English)

It can be seen as our responsibility as engineers to steer our products in that direction. Not all of them require or would benefit from such transparency (rarely anyone cares about how an accounting system works), and it would normally be the product people that can decide what should become more transparent, but we also have a stake and should be morally accountable for what we build.

I hope we start thinking and working towards more transparent systems. And of course, each system has its specifics and ways to become more transparent, but I think it should be a general mode of thinking – how do we make our systems not only better and more secure, but also more understandable so that we don’t become the “masterminds” behind black-boxes that decide lives.

The post Algorithmic and Technological Transparency appeared first on Bozho's tech blog.

Random App Ideas

Post Syndicated from Bozho original https://techblog.bozho.net/random-app-ideas/

Every now and then you start thinking “wouldn’t it be nice to have an app for X”. When I was in that situation, I took a note. Then that note grew, I cleaned up some absurd ones and added some more. I implemented some of these ideas of mine, the rest formed a “TODO” list.

At some point I realized I won’t be able to implement them, as the time needed is not something I’m likely to have in the near future. But I didn’t like to just delete the notes. So here they are – my random app ideas. Probably useless, but maybe a little interesting.

  • Receipts via smartphone NFC – there are apps that let you track your expenses, but they are tedious. There are apps that try to OCR receipts, but they vary too much in different parts of the world for that to be consistent (there are companies like Receipt bank that do something like that, but it’s not the way I’d like this problem solved in the long run). So I thought stores can offer the option for NFC receipts – you just tap your phone to a device (or another phone) and get the receipt in an electronic form. Not the picture, but the raw data – items, prices, taxes, issuer. Then you have to option to print it if you like. Of course I realized at some point that first legislation should allow for that – in many cases you must issue a paper receipt and the digital one is not “legal”. But the idea still remains viable and probably not hard to implement. It will allow much easier tracking of expenses by users and will save a lot of paper.
  • Am I drunk? – breathalyzer add-ons for phones are not something many people would buy. My idea was to detect drunkenness based on motion – the accelerometer (and possibly other sensors) could be trained to recognize a drunken walk on the street, thus informing you that you are… drunk. What’s the use of that – maybe if while walking to your car the app alerts you, you may be less likely to drive.
  • Mobile scavenger hunt – scavenger hunts have always been interesting. I remember we devised one for a friend’s birthday a few years ago. It would be nice, though, to be able to build such scavenger hunts dynamically, based on multiple configurable building blocks. The app could generate multiple outcomes based on certain rules (mostly using the GPS of the phone). It could be timed, it could include scanning barcodes (including those on actual products), visiting particular addresses, listening to songs and their lyrics, extracting data from Wikipedia articles, etc. “Generate scavenger hunt” (for a friend or randomly for yourself) might be interesting.
  • Sentiment analysis prior to commenting – something that somehow plugs to facebook or other social media (or screen-scrapes it?), plus a browser plugin (for desktop), that does sentiment analysis on the facebook comments you are currently typing. And recommending against them if they are too negative, personal attacks, etc. Kind of like a cool-down timer saying “thing twice before destroying the online discussion with that comment”. It is a problem of online communication that too often it gets out of control (and the same people wouldn’t be as rude in real life as online)
  • OCR shop signs – this was more of a Google glass app than for regular phone, as it requires capturing one’s surroundings. The idea is to crowdsource the current shops and restaurants and put them on a map. At the moment we rely on old and incomplete data – either data that the owners of a shop or restaurant have put, or that some contributor has added. However, shops and restaurants change often and you may not know that something has moved. It doesn’t sound that useful, but worth a thought.
  • Algorithmic impressionist painter – you train an algorithm and it creates a random picture. Now, that has already been done (I’ve been having the idea ever since I created my algorithmic music composer). And I’ve heard critics destroy the algorithmic paintings. But it’s an interesting experiment nonetheless.
  • Life game – basically record your actions and get rewarded for good deeds. The game can give particular challenges (e.g. “go help a soup kitchen”), and bonuses. The challenge here lies in data protection – depending on the level of detail, the application may have too much personal data that should not leak. Encrypting everything with a private key stored in the secure storage of your device may be one way to resolve that. I know it sounds a bit dystopian and now, after seeing something like that implemented in China, I’m happy that I can say “oh, I had this idea six years ago” and I’ve grown since then. Anyway, in my rendition it’s a voluntary game, not a state-imposed citizen score. But re-purposing technologies is a specialty of regimes, so maybe it’s a bad idea to build such a system in the first place.
  • Random hugs – you know those “free hugs” signs. You can have an app that detects other users of the same app via WiFi direct (or other GPS or non-GPS proximity/distance detection) and offers both of you to do a random hug. Yes, yes, I know it is awkward to hug strangers and it may be abused for sexual misconduct. In my defense, when I had the idea six years ago, the public attention and knowledge about sexual abuse was not on the levels it is today. Still, with a proper rating system and “report abuse”, this may not be an issue. And as hugs are considered a good thing for psychological health, it might not be such the dumb idea that it sounds.

In all cases above, there are interesting technical challenges – devising a standard receipt format and optimizing the UX of receipt exchange, training a model to detect drunkenness based on accelerometer readings, training a painting algorithm, doing sentiment analysis on small pieces of text, using niche technologies like WiFi direct for proximity, data protection and cryptography, OCR-ing pictures of surroundings. This is why I’m sharing them in this blog.

This is in line with my previous posts about making side projects with technologies that are new for you. The app may end up useless, or not used by many people, but you’ve still learned an important skill and applied it in practice.

The post Random App Ideas appeared first on Bozho's tech blog.

Models for Electronic Identification

Post Syndicated from Bozho original https://techblog.bozho.net/models-for-electronic-identification/

Electronic identity is an important concept as it lies on the crossroads of the digital, the physical and the legal worlds. How do you prove your identity without being physically present? I’ve previously given an overview talk on electronic identification, tackled the high-level and philosophical aspects, and criticized biometric-only identification as insecure. Now I want to list a few practical ways in which eID can be implemented.

First, it’s important to mention the eIDAS regulation which makes electronic identity a legal reality. It says what an electronic identification scheme is and what are its legal effects (proving that you are who you claim to be). And it defines multiple “levels of assurance”, i.e. the levels of security of a method of authentication. But it is a broad framework that doesn’t tell you how to do things in the technical world.

And while electronic identification is mostly cited in the context of government and municipal services, it applies to private companies as well. Currently in the US, for example, the SSN is used for electronic identification. This is a very poor approach, as leaking the SSN allows for identity theft. In the EU there are many different approaches to do that, from the Estonian PKI-based approach, to UK’s verify initiative, which relies on the databases of private companies.

You can see electronic identity as a more legally-meaningful login. You still perform a login, in many cases using username and password as one of the factors, but it carries additional information – who is the actual person behind that login. In some cases it doesn’t have to even give information on who the person is – it can just confirm that such a person exists, and some attributes of that person – age (e.g. if you want to purchase alcohol online), city of registration (e.g. if you want to use municipal services), conviction status (e.g. if applying for a driver in an Uber-like service). It is also very useful when doing anti money laundering checks (e.g. if you are a payment provider, an online currency or crypto-currency exchange, etc.)

Electronic identification schemes can be public and private. Public are operated by governments (federal or state in the case of the US), a particular institution (e.g. the one the issues driver’s licenses). Private ones can be operated by any company that has had the ability to verify your physical identity – e.g. by you going and signing a contract with them. A bank, a telecom, a utility company.

I will use “authentication” and “identification” interchangeably, and for practical purposes this is sort-of true. They differ, of course, as authentication is proving you are who you are, and identification is uniquely identifying you among others (and they have slightly different meanings when it comes to cryptography, but let’s not get carried away in terminology).

Enrollment is the process of signing you up in the electronic identification scheme’s database. It can include a typical online registration step, but it has to do proper identity verification. This can be done in three ways:

  • In-person – you physically go to a counter to have your identity verified. This is easy in the EU where ID cards exists, and a bit harder in the US, where you are not required to actually have an identity document (though you may have one of several). In that case you’d have to bring a birth certificate, utility bills or whatever the local legislation requires
  • Online – any combination of the following may be deemed acceptable, depending on the level of assurance needed: a videoconferencing call; selfie with an identity document; separate picture of an identity document; camera-based liveness detection; matching of selfie with government-registered photo. Basically, a way to say that 1. I have this document 2. I am the person on the document. This could be automated or manual. But does not require physical presence.
  • By proxy – by relying on another eID provider that can confirm your identity. This is an odd option, but you can cascade eID schemes.

And then there’s the technical aspects – what do you add to “username and password” to make identity theft less likely or nearly impossible:

  • OTP (one-time passwords). This can be a hardware OTP token (e.g. RSA SecureID) or a software-based TOTP (like Google Authenticator). The principal of both is the same – the client and the server share a secret, and based on the current time, generate a 6-digit password. Note that storing the secrets on the server side is not trivial – ideally that should be on an HSM (hardware security module) that can do native OTP, otherwise the secrets can leak and your users can be impersonated (The HSM is supposed to not let any secret key leave the hardware). There are less secure OTP approaches, like SMS or other type of messages – you generate one and send it to a registered phone, viber, telegram, email, etc. Banks often use that for their login, but it cannot be used across organizations, as it would require the secrets to be shared. Because the approach is centralized, you can easily revoke an OTP, e.g. declare a phone or OTP device as stolen and then go get a new one / register a new phone.
  • PKI-based authentication – when you verify the person’s identity, have them generate a private key, and issue a X.509 certificate for the corresponding public key. That way the user can use the private key to authenticate (the most straightforward way – TLS mutual authentication, where the user signs a challenge with the private key to confirm they are the “owners” of the certificate). The certificate would normally hold some identifier which can then be used to fetch data from databases. Alternatively, the data can be on the certificate itself, but that has some privacy implications and is rarely a good option. This option can be use across institutions, as you can prove you are the person that owns a private key without the other side needing to share a secret with you. They only need the certificate, and it is public anyway. Another benefit of PKI-based authentication is revokability – in case the user’s private key is somehow compromised, you can easily revoke a certificate (publish it in a CRL, for example).
  • Biometrics – when you are enrolled, you scan a fingerprint, a palm, an iris, a retina or whatever the current cool biometric tech is. I often argue that this cannot be your main factor of authentication. It can and sometimes should be used as an additional safeguard, but it has a big problem – it cannot be revoked. Once a biometric identifier is leaked, it is impossible to stop people from using it. And while they may not be able to fool scanners (although for fingerprints that has been proven easy in the past), the scanners communicate with a server which perform authentication. An attacker may simply spoof a scanner and make it seem to the server that the biometric data was properly obtained. If that has to be avoided, scanners have to themselves be identified by signing a challenge with a private key in a secure hardware module, which make the whole process too complicated to be meaningful. But then again – the biometric factor is useful and important, as we’ll see below.

The typical “factors” in an authentication process are: something you know (passwords), something you have (OTP token, smartcard, phone) and something you are (biometrics). The “something you have” part is what generates multiple variations to the PKI approach mentioned above:

  • Use an unprotected storage on a computer to store the private key – obviously not secure enough, as the private key can be easily extracted and your identity can thus be stolen. But it has to be mentioned, as it can be useful in lower-risk scenarios
  • Use a smartcard – a hardware device that can handle PKI operations (signing, encryption) and does not let private keys leave the hardware. Smartcards are tricky, as they require a reader (usually plugged via USB) and vendor-specific drivers and “magic” to have browser support. Depending on the circumstances, it could be a good approach, as it is the most secure – there is no way for someone to impersonate you, apart from stealing your smartcard and knowing both your smartcard PIN and your password. The problem with plugging the smartcard can be alleviated by utilzing NFC with a smartphone (so just place the card on the back of the smartphone in order to authenticate) but that leads to a lot more other problems, e.g. how to protect the communication from eavesdropping and MITM attacks (as far as I know, there is no established standard for that, except for NFC-SEC, which I think is not universally supported). The smartcard can be put on a national ID card, a separate card, or even the ones in existing bank cards can be reused (though issuers are reluctant to share the chip with functionality (applets) other than the EMV).
  • Use a smartphone – smartphones these days have secure storage capabilities (e.g. Android’s Trusted execution environment or iPhone’s Secure Enclave). A few years ago when I was doing a more thorough, these secure modules were not perfect and there were known attacks, but they have certainly improved. You can in many cases rely on a smartphone to protect the private key. Then, in order to authenticate, you’d need a PIN or biometrics to unlock the phone. Here’s when biometrics come really handy – when they don’t leave the phone, so even if leaked, they cannot be used to impersonate you. They can only be used to potentially make a fake fingerprint to unlock the phone (which should also be stolen). And of course, there’s still the password (“something you know”).
  • Remote HSM – the private keys can be stored remotely, on a hardware security module, so that they cannot physically leave the hardware. However, the hardware is not under your physical control and unlocking it requires just a PIN, which turns this scheme into just “something you know” (times 2, if you add the password). Remote identification and remote signing schemes are becoming popular and in order for them to be secure, you also have to somehow associate the device with the particular user and their private key on the HSM. This can be done in a combination of ways, including the IMEI of the phone (which is spoofable, though) and utilizing some of the aforementioned options – the protected storage of the phone and OTPs handled behind the scene. (Note: the keys on the HSM should be in the protected storage. Having them in an external database encrypted by the master key is not good enough, as they can still leak). If you are going to rely on the smartphone secure storage anyway, what’s the benefit of the remote HSM? It’s twofold – first – loosing the phone doesn’t mean you cannot use the same key again, and second – it reduces the risk of leaking the key, as the HSM is theoretically more secure than the smartphone storage
  • Hybrid / split key – the last two approaches – the smartphone secure storage and the remote HSM can be combined for additional security. You can have the key split in two – part in the smartphone, part on the HSM. That way you are reducing the risk of the key leaking. Losing the phone, however, would mean the key should be regenerated and new certificates should be issued, but that may be okay depending on the usecase.

As you can see, the smartphone secure storage is becoming animportant aspect for electronic identification that is both secure and usable. It allows easily adding a biometric factor without the need to be able to revoke it. And it doesn’t rely on a clunky smartcard that you can’t easily plug.

This is not everything that can be said about secure electronic identification, but I think it’s enough details to get a good picture. It’s not trivial and getting it wrong may lead to real-world damages. It is viewed as primarily government-related, but the eID market in Europe is likely going to grow (partly thanks to unifying the legislation by eIDAS) and many private providers will take part in that. In the US the problem of identity theft and the horrible practice of using the SSN for authentication is being recognized and it’s likely that legislative efforts will follow to put electronic identification on track and in turn foster a market for eID solutions (which is currently a patchwork of scanning and manual verification of documents).

The ultimate goal is to be both secure and usable. And that’s always hard. But thanks to the almost ubiquitous smartphone, it is now possible (though backup options should exist for people that don’t have smartphones). Electronic identification is a key enabler for the so called “digital transformation”, and getting it right is crucial for the digital economy. Apologies for the generic high-level sentence, but I do think we should have technical discussions at the same time as policy discussions, otherwise the two diverge and policy monsters are born.

The post Models for Electronic Identification appeared first on Bozho's tech blog.

Typical Workarounds For Compliant Logs

Post Syndicated from Bozho original https://techblog.bozho.net/typical-workarounds-for-compliant-logs/

You may think you have logs. Chances are, you can rely on them only for tracing exceptions and debugging. But you can’t rely on them for compliance, forensics, or any legal matter. And that may be none of your concern as an engineer, but it is one of those important non-functional requirements that, if not met, are ultimately our fault.

Because of my audit trail startup, I’m obviously both biased and qualified to discuss logs (I’ve previously described what audit trail / audit logs are and how they can be maintained). And while they are a only a part of the security aspects, and certainly not very exciting, they are important, especially from a legal standpoint. That’s why many standards and laws – including ISO 27001, PCI-DSS, HIPAA, SOC 2, PSD2, GDPR – require audit trail functionality. And most of them have very specific security requirements.

PCI-DSS has a bunch of sections with audit trail related requirements:

10.2.3
“Malicious users often attempt to alter audit logs to hide their actions, and a record of access allows an organization to trace any inconsistencies or potential tampering of the logs to an individual account. [..]”

10.5 Secure audit trails so they cannot be altered. Often a malicious individual who has entered the network will attempt to edit the audit logs in order to hide their activity. Without adequate protection of audit logs, their completeness, accuracy, and integrity cannot be guaranteed, and the audit logs can be rendered useless as an investigation tool after a compromise.

10.5.5
Use file-integrity monitoring or change-detection software on logs to ensure that existing log data cannot be changed without generating alerts (although new data being added should not cause an alert).

ISO 27001 Annex A also speaks about protecting the audit trail against tampering

A.12.4.2 Protection of log information
Logging facilities and log information shall be protected against tampering and unauthorized access

From my experience, sadly, logs are rarely fully compliant. However, auditors are mostly fine with that and certification is given, even though logs can be tampered with. I decided to collect and list the typical workarounds the the secure, tamper-evident/tamper-protected audit logs.

  • We don’t need them secured – you almost certainly do. If you need to be compliant – you do. If you don’t need to be compliant, but you have a high-value / high-impact system, you do. If it doesn’t have to be compliant and it’s low-value or low-impact, than yes. You don’t need much security for that anyway (but be careful to not underestimate the needed security)
  • We store it in multiple places with different access – this is based on the assumption that multiple administrators won’t conspire to tamper with the logs. And you can’t guarantee that, of course. But even if you are sure they won’t, you can’t prove that to external parties. Imagine someone sues you and you provide logs as evidence. If the logs are not tamper-evident, the other side can easily claim you have fabricated logs and make them inadmissible evidence.
  • Our system is so complicated, nobody knows how to modify data without breaking our integrity checks – this is the “security through obscurity” approach and it will probably work well … until it doesn’t.
  • We store it with external provider – external log providers usually claim they provide compliance. And they do provide many aspects of compliance, mainly operational security around the log collection systems. Besides, in that case you (or your admins) can’t easily modify the externally stored records. Some providers may give you the option to delete records, which isn’t great for audit trail. The problem with such services is that they keep the logs operational for relatively short periods of time and then export them in a form that can be tampered with. Furthermore, you can’t really be sure that the logs are not tampered with. Yes, the provider is unlikely to care about your logs, but having that as the main guarantee doesn’t sound perfect.
  • We are using trusted timestamping – and that’s great, it covers many aspects of the logs integrity. AWS is timestamping their CloudTrail log files and it’s certainly a good practice. However, it comes short in just one scenario – someone deleting an entire timestamped file. And because it’s a whole file, rather than record-by-record, you won’t know which record exactly was targeted. There’s another caveat – if the TSA is under your control, you can backdate timestamps and therefore can’t prove that you didn’t fabricate logs.

These approaches are valid and are somewhere on a non-zero point on the compliance spectrum. Is having four copies of the data accessible to different admins better than having just one? Yup. Is timestamping with a local TSA better than not timestamping? Sure. Is using an external service more secure than using a local one? Yes. Are they sufficient to get certified? Apparently yes. Is this the best that can be done? No. Do you need the best? Sometimes yes.

Most of these measures are organizational rather than technical, and organizational measures can be reversed or circumvented much more easily than technical ones. And I may be too paranoid, but when I was a government advisor, I had to be paranoid when it comes to security.

And what do I consider a non-workaround? There is a lot of research around tamper-evident logging, tamper-evident data structures, merkle trees, hash chains. Here’s just one example. It should have been mainstream by now, but it isn’t. We’ve apparently settled on suboptimal procedures (even when it comes to cost), and we’ve interpreted the standards loosely, looking for a low-hanging fruit. And that’s fine for some organizations. I guess.

It takes time for a certain good practice to become mainstream, and it has to be obvious. Tamper-evident logging isn’t obvious. We had gradually become more aware about properly storing passwords, using TLS, multi-factor authentication. But security is rarely a business priority, as seen in reports about what drives security investments (it’s predominantly “compliance”).

As a practical conclusion – if you are going to settle for a workaround, at least choose a better one. Not having audit trail or not making any effort to protect it from tampering should be out of the question.

The post Typical Workarounds For Compliant Logs appeared first on Bozho's tech blog.

Proving Digital Events (Without Blockchain)

Post Syndicated from Bozho original https://techblog.bozho.net/proving-digital-events-without-blockchain/

Recently technical and non-technical people alike started to believe that the best (and only) way to prove that something has happened in an information system is to use a blockchain. But there are other ways to achieve that that are arguably better and cheaper. Of course, blockchain can be used to do that, and it will do it well, but it is far from the only solution to this problem.

The way blockchain proves that some event has occurred by putting it into a tamper-evident data structure (a hash chain of the roots of merkle trees of transactions) and distributing that data structure across multiple independent actors so that “tamper-evident” becomes “tamper-proof” (sort-of). So if an event is stored on a blockchain, and the chain is intact (and others have confirmed it’s intact), this is a technical guarantee that it had indeed happened and was neither back-dated, nor modified.

An important note here – I’m stressing on “digital” events, because no physical event can be truly guaranteed electronically. The fact that someone has to enter the physical event into a digital system makes this process error-prone and the question becomes “was the event correctly recorded” rather than “was it modified once it was recorded”. And yes, you can have “certified” / “approved” recording devices that automate transferring physical events to the digital realm, e.g. certified speed cameras, but the certification process is a separate topic. So we’ll stay purely in the digital realm (and ignore all provenance use cases).

There are two aspects to proving digital events – technical and legal. Once you get in court, it’s unlikely to be able to easily convince a judge that “byzantine fault tolerance guarantees tamper-proof hash chains”. You need a legal framework to allow for treating digital proofs as legally binding.

Luckily, Europe has such a legal framework – Regulation (EU) 910/2014. It classifies trust services in three categories – basic, advanced and qualified. Qualified ones are always supplied by a qualified trust service provider. The benefit of qualified signatures and timestamps is that the burden of proof is on the one claiming that the event didn’t actually happen (or was modified). If a digital event is signed with a qualified electronic signature or timestamped with a qualified timestamp, and someone challenges that occurrence of the event, it is they that should prove that it didn’t happen.

Advanced and basic services still bear legal strength – you can bring a timestamped event to court and prove that you’ve kept your keys securely so that nobody could have backdated an event. And the court should acknowledge that, because it’s in the law.

Having said that, the blockchain, even if it’s technically more secure, is not the best option from a legal point of view. Timestamps on blocks are not put by qualified trust service providers, but by nodes on the system and therefore could be seen as non-qualified electronic time stamp. Signatures on transactions have a similar problem – they are signed by anonymous actors on the network, rather than individuals whose identity is linked to the signature, therefore making them legally weaker.

On the technical side, we have been able to prove events even before blockchain. With digital signatures and trusted timestamps. Once you do a, say, RSA signature (encrypt the hash of the content with your private key, so that anyone knowing your public key can decrypt it and match it to the hash of the content you claim to have signed, thus verifying that it is indeed you who signed it), you cannot deny having signed it (non-repudiation). The signature also protects the integrity of the data (it can’t be changed without breaking the signature). It is also known who signed it, owning the private key (authentication). Having these properties on an piece of data (“event”) you can use it to prove that this event has indeed occurred.

You can’t, however, prove when it occurred – for that you need trusted timestamping. Usually a third-party provider signing the data you send them, and having the current timestamp in the signed response. That way, using public key cryptography and a few centralized authorities (the CA and the TSA), we’ve been able to prove the existence of digital events.

And yes, relying on centralized infrastructure is not perfect. But apart from a few extreme cases, you don’t need 100% protection for 100% of your events. That is not to say that you should go entirely unprotected and hope that an event has occurred simply because it is in some log file.

Relying on plain log files for proving things happened is a “no-go”, as I’ve explained in a previous post about audit trail. You simply can’t prove you didn’t back-date or modify the event data.

But you can rely on good old PKI to prove digital events (of course, blockchain also relies on public key cryptography). And the blockchain approach will not necessarily be better in court.

In a private blockchain you can, of course, utilize centralized components, like a TSA (Time stamping authority) or a CA to get the best of both worlds. And adding hash chains and merkle trees to the mix is certainly great from a technical perspective (which is what I’ve been doing recently). But you don’t need a distributed consensus in order to prove something digital happened – we’ve had the tools for that even before proof-of-work distributed consensus existed.

The post Proving Digital Events (Without Blockchain) appeared first on Bozho's tech blog.

The Benefits of Side Projects

Post Syndicated from Bozho original https://techblog.bozho.net/the-benefits-of-side-projects/

Side projects are the things you do at home, after work, for your own “entertainment”, or to satisfy your desire to learn new stuff, in case your workplace doesn’t give you that opportunity (or at least not enough of it). Side projects are also a way to build stuff that you think is valuable but not necessarily “commercialisable”. Many side projects are open-sourced sooner or later and some of them contribute to the pool of tools at other people’s disposal.

I’ve outlined one recommendation about side projects before – do them with technologies that are new to you, so that you learn important things that will keep you better positioned in the software world.

But there are more benefits than that – serendipitous benefits, for example. And I’d like to tell some personal stories about that. I’ll focus on a few examples from my list of side projects to show how, through a sort-of butterfly effect, they helped shape my career.

The computoser project, no matter how cool algorithmic music composition, didn’t manage to have much of a long term impact. But it did teach me something apart from niche musical theory – how to read a bulk of scientific papers (mostly computer science) and understand them without being formally trained in the particular field. We’ll see how that was useful later.

Then there was the “State alerts” project – a website that scraped content from public institutions in my country (legislation, legislation proposals, decisions by regulators, new tenders, etc.), made them searchable, and “subscribable” – so that you get notified when a keyword of interest is mentioned in newly proposed legislation, for example. (I obviously subscribed for “information technologies” and “electronic”).

And that project turned out to have a significant impact on the following years. First, I chose a new technology to write it with – Scala. Which turned out to be of great use when I started working at TomTom, and on the 3rd day I was transferred to a Scala project, which was way cooler and much more complex than the original one I was hired for. It was a bit ironic, as my colleagues had just read that “I don’t like Scala” a few weeks earlier, but nevertheless, that was one of the most interesting projects I’ve worked on, and it went on for two years. Had I not known Scala, I’d probably be gone from TomTom much earlier (as the other project was restructured a few times), and I would not have learned many of the scalability, architecture and AWS lessons that I did learn there.

But the very same project had an even more important follow-up. Because if its “civic hacking” flavour, I was invited to join an informal group of developers (later officiated as an NGO) who create tools that are useful for society (something like MySociety.org). That group gathered regularly, discussed both tools and policies, and at some point we put up a list of policy priorities that we wanted to lobby policy makers. One of them was open source for the government, the other one was open data. As a result of our interaction with an interim government, we donated the official open data portal of my country, functioning to this day.

As a result of that, a few months later we got a proposal from the deputy prime minister’s office to “elect” one of the group for an advisor to the cabinet. And we decided that could be me. So I went for it and became advisor to the deputy prime minister. The job has nothing to do with anything one could imagine, and it was challenging and fascinating. We managed to pass legislation, including one that requires open source for custom projects, eID and open data. And all of that would not have been possible without my little side project.

As for my latest side project, LogSentinel – it became my current startup company. And not without help from the previous two mentioned above – the computer science paper reading was of great use when I was navigating the crypto papers landscape, and from the government job I not only gained invaluable legal knowledge, but I also “got” a co-founder.

Some other side projects died without much fanfare, and that’s fine. But the ones above shaped my “story” in a way that would not have been possible otherwise.

And I agree that such serendipitous chain of events could have happened without side projects – I could’ve gotten these opportunities by meeting someone at a bar (unlikely, but who knows). But we, as software engineers, are capable of tilting chance towards us by utilizing our skills. Side projects are our “extracurricular activities”, and they often lead to unpredictable, but rather positive chains of events. They would rarely be the only factor, but they are certainly great at unlocking potential.

The post The Benefits of Side Projects appeared first on Bozho's tech blog.

Bad Software Is Our Fault

Post Syndicated from Bozho original https://techblog.bozho.net/bad-software-is-our-fault/

Bad software is everywhere. One can even claim that every software is bad. Cool companies, tech giants, established companies, all produce bad software. And no, yours is not an exception.

Who’s to blame for bad software? It’s all complicated and many factors are intertwined – there’s business requirements, there’s organizational context, there’s lack of sufficient skilled developers, there’s the inherent complexity of software development, there’s leaky abstractions, reliance on 3rd party software, consequences of wrong business and purchase decisions, time limitations, flawed business analysis, etc. So yes, despite the catchy title, I’m aware it’s actually complicated.

But in every “it’s complicated” scenario, there’s always one or two factors that are decisive. All of them contribute somehow, but the major drivers are usually a handful of things. And in the case of base software, I think it’s the fault of technical people. Developers, architects, ops.

We don’t seem to care about best practices. And I’ll do some nasty generalizations here, but bear with me. We can spend hours arguing about tabs vs spaces, curly bracket on new line, git merge vs rebase, which IDE is better, which framework is better and other largely irrelevant stuff. But we tend to ignore the important aspects that span beyond the code itself. The context in which the code lives, the non-functional requirements – robustness, security, resilience, etc.

We don’t seem to get security. Even trivial stuff such as user authentication is almost always implemented wrong. These days Twitter and GitHub realized they have been logging plain-text passwords, for example, but that’s just the tip of the iceberg. Too often we ignore the security implications.

“But the business didn’t request the security features”, one may say. The business never requested 2-factor authentication, encryption at rest, PKI, secure (or any) audit trail, log masking, crypto shredding, etc., etc. Because the business doesn’t know these things – we do and we have to put them on the backlog and fight for them to be implemented. Each organization has its specifics and tech people can influence the backlog in different ways, but almost everywhere we can put things there and prioritize them.

The other aspect is testing. We should all be well aware by now that automated testing is mandatory. We have all the tools in the world for unit, functional, integration, performance and whatnot testing, and yet many software projects lack the necessary test coverage to be able to change stuff without accidentally breaking things. “But testing takes time, we don’t have it”. We are perfectly aware that testing saves time, as we’ve all had those “not again!” recurring bugs. And yet we think of all sorts of excuses – “let the QAs test it”, we have to ship that now, we’ll test it later”, “this is too trivial to be tested”, etc.

And you may say it’s not our job. We don’t define what has do be done, we just do it. We don’t define the budget, the scope, the features. We just write whatever has been decided. And that’s plain wrong. It’s not our job to make money out of our code, and it’s not our job to define what customers need, but apart from that everything is our job. The way the software is structured, the security aspects and security features, the stability of the code base, the way the software behaves in different environments. The non-functional requirements are our job, and putting them on the backlog is our job.

You’ve probably heard that every software becomes “legacy” after 6 months. And that’s because of us, our sloppiness, our inability to mitigate external factors and constraints. Too often we create a mess through “just doing our job”.

And of course that’s a generalization. I happen to know a lot of great professionals who don’t make these mistakes, who strive for excellence and implement things the right way. But our industry as a whole doesn’t. Our industry as a whole produces bad software. And it’s our fault, as developers – as the only people who know why a certain piece of software is bad.

In a talk of his, Bob Martin warns us of the risks of our sloppiness. We have been building websites so far, but we are more and more building stuff that interacts with the real world, directly and indirectly. Ultimately, lives may depend on our software (like the recent unfortunate death caused by a self-driving car). And I’ll agree with Uncle Bob that it’s high time we self-regulate as an industry, before some technically incompetent politician decides to do that.

How, I don’t know. We’ll have to think more about it. But I’m pretty sure it’s our fault that software is bad, and no amount of blaming the management, the budget, the timing, the tools or the process can eliminate our responsibility.

Why do I insist on bashing my fellow software engineers? Because if we start looking at software development with more responsibility; with the fact that if it fails, it’s our fault, then we’re more likely to get out of our current bug-ridden, security-flawed, fragile software hole and really become the experts of the future.

The post Bad Software Is Our Fault appeared first on Bozho's tech blog.

OMG The Stupid It Burns

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/04/omg-stupid-it-burns.html

This article, pointed out by @TheGrugq, is stupid enough that it’s worth rebutting.

The article starts with the question “Why did the lessons of Stuxnet, Wannacry, Heartbleed and Shamoon go unheeded?“. It then proceeds to ignore the lessons of those things.
Some of the actual lessons should be things like how Stuxnet crossed air gaps, how Wannacry spread through flat Windows networking, how Heartbleed comes from technical debt, and how Shamoon furthers state aims by causing damage.
But this article doesn’t cover the technical lessons. Instead, it thinks the lesson should be the moral lesson, that we should take these things more seriously. But that’s stupid. It’s the sort of lesson people teach you that know nothing about the topic. When you have nothing of value to contribute to a topic you can always take the moral high road and criticize everyone for being morally weak for not taking it more seriously. Obviously, since doctors haven’t cured cancer yet, it’s because they don’t take the problem seriously.
The article continues to ignore the lesson of these cyber attacks and instead regales us with a list of military lessons from WW I and WW II. This makes the same flaw that many in the military make, trying to understand cyber through analogies with the real world. It’s not that such lessons could have no value, it’s that this article contains a poor list of them. It seems to consist of a random list of events that appeal to the author rather than events that have bearing on cybersecurity.
Then, in case we don’t get the point, the article bullies us with hyperbole, cliches, buzzwords, bombastic language, famous quotes, and citations. It’s hard to see how most of them actually apply to the text. Rather, it seems like they are included simply because he really really likes them.
The article invests much effort in discussing the buzzword “OODA loop”. Most attacks in cyberspace don’t have one. Instead, attackers flail around, trying lots of random things, overcoming defense with brute-force rather than an understanding of what’s going on. That’s obviously the case with Wannacry: it was an accident, with the perpetrator experimenting with what would happen if they added the ETERNALBLUE exploit to their existing ransomware code. The consequence was beyond anybody’s ability to predict.
You might claim that this is just the first stage, that they’ll loop around, observe Wannacry’s effects, orient themselves, decide, then act upon what they learned. Nope. Wannacry burned the exploit. It’s essentially removed any vulnerable systems from the public Internet, thereby making it impossible to use what they learned. It’s still active a year later, with infected systems behind firewalls busily scanning the Internet so that if you put a new system online that’s vulnerable, it’ll be taken offline within a few hours, before any other evildoer can take advantage of it.
See what I’m doing here? Learning the actual lessons of things like Wannacry? The thing the above article fails to do??
The article has a humorous paragraph on “defense in depth”, misunderstanding the term. To be fair, it’s the cybersecurity industry’s fault: they adopted then redefined the term. That’s why there’s two separate articles on Wikipedia: one for the old military term (as used in this article) and one for the new cybersecurity term.
As used in the cybersecurity industry, “defense in depth” means having multiple layers of security. Many organizations put all their defensive efforts on the perimeter, and none inside a network. The idea of “defense in depth” is to put more defenses inside the network. For example, instead of just one firewall at the edge of the network, put firewalls inside the network to segment different subnetworks from each other, so that a ransomware infection in the customer support computers doesn’t spread to sales and marketing computers.
The article talks about exploiting WiFi chips to bypass the defense in depth measures like browser sandboxes. This is conflating different types of attacks. A WiFi attack is usually considered a local attack, from somebody next to you in bar, rather than a remote attack from a server in Russia. Moreover, far from disproving “defense in depth” such WiFi attacks highlight the need for it. Namely, phones need to be designed so that successful exploitation of other microprocessors (namely, the WiFi, Bluetooth, and cellular baseband chips) can’t directly compromise the host system. In other words, once exploited with “Broadpwn”, a hacker would need to extend the exploit chain with another vulnerability in the hosts Broadcom WiFi driver rather than immediately exploiting a DMA attack across PCIe. This suggests that if PCIe is used to interface to peripherals in the phone that an IOMMU be used, for “defense in depth”.
Cybersecurity is a young field. There are lots of useful things that outsider non-techies can teach us. Lessons from military history would be well-received.
But that’s not this story. Instead, this story is by an outsider telling us we don’t know what we are doing, that they do, and then proceeds to prove they don’t know what they are doing. Their argument is based on a moral suasion and bullying us with what appears on the surface to be intellectual rigor, but which is in fact devoid of anything smart.
My fear, here, is that I’m going to be in a meeting where somebody has read this pretentious garbage, explaining to me why “defense in depth” is wrong and how we need to OODA faster. I’d rather nip this in the bud, pointing out if you found anything interesting from that article, you are wrong.

Audit Trail Overview

Post Syndicated from Bozho original https://techblog.bozho.net/audit-trail-overview/

As part of my current project (secure audit trail) I decided to make a survey about the use of audit trail “in the wild”.

I haven’t written in details about this project of mine (unlike with some other projects). Mostly because it’s commercial and I don’t want to use my blog as a direct promotion channel (though I am doing that at the moment, ironically). But the aim of this post is to shed some light on how audit trail is used.

The survey can be found here. The questions are basically: does your current project have audit trail functionality, and if yes, is it protected from tampering. If not – do you think you should have such functionality.

The results are interesting (although with only around 50 respondents)

So more than half of the systems (on which respondents are working) don’t have audit trail. While audit trail is recommended by information security and related standards, it may not find place in the “busy schedule” of a software project, even though it’s fairly easy to provide a trivial implementation (e.g. I’ve written how to quickly setup one with Hibernate and Spring)

A trivial implementation might do in many cases but if the audit log is critical (e.g. access to sensitive data, performing financial operations etc.), then relying on a trivial implementation might not be enough. In other words – if the sysadmin can access the database and delete or modify the audit trail, then it doesn’t serve much purpose. Hence the next question – how is the audit trail protected from tampering:

And apparently, from the less than 50% of projects with audit trail, around 50% don’t have technical guarantees that the audit trail can’t be tampered with. My guess is it’s more, because people have different understanding of what technical measures are sufficient. E.g. someone may think that digitally signing your log files (or log records) is sufficient, but in fact it isn’t, as whole files (or records) can be deleted (or fully replaced) without a way to detect that. Timestamping can help (and a good audit trail solution should have that), but it doesn’t guarantee the order of events or prevent a malicious actor from deleting or inserting fake ones. And if timestamping is done on a log file level, then any not-yet-timestamped log file is vulnerable to manipulation.

I’ve written about event logs before and their two flavours – event sourcing and audit trail. An event log can effectively be considered audit trail, but you’d need additional security to avoid the problems mentioned above.

So, let’s see what would various levels of security and usefulness of audit logs look like. There are many papers on the topic (e.g. this and this), and they often go into the intricate details of how logging should be implemented. I’ll try to give an overview of the approaches:

  • Regular logs – rely on regular INFO log statements in the production logs to look for hints of what has happened. This may be okay, but is harder to look for evidence (as there is non-auditable data in those log files as well), and it’s not very secure – usually logs are collected (e.g. with graylog) and whoever has access to the log collector’s database (or search engine in the case of Graylog), can manipulate the data and not be caught
  • Designated audit trail – whether it’s stored in the database or in logs files. It has the proper business-event level granularity, but again doesn’t prevent or detect tampering. With lower risk systems that may is perfectly okay.
  • Timestamped logs – whether it’s log files or (harder to implement) database records. Timestamping is good, but if it’s not an external service, a malicious actor can get access to the local timestamping service and issue fake timestamps to either re-timestamp tampered files. Even if the timestamping is not compromised, whole entries can be deleted. The fact that they are missing can sometimes be deduced based on other factors (e.g. hour of rotation), but regularly verifying that is extra effort and may not always be feasible.
  • Hash chaining – each entry (or sequence of log files) could be chained (just as blockchain transactions) – the next one having the hash of the previous one. This is a good solution (whether it’s local, external or 3rd party), but it has the risk of someone modifying or deleting a record, getting your entire chain and re-hashing it. All the checks will pass, but the data will not be correct
  • Hash chaining with anchoring – the head of the chain (the hash of the last entry/block) could be “anchored” to an external service that is outside the capabilities of a malicious actor. Ideally, a public blockchain, alternatively – paper, a public service (twitter), email, etc. That way a malicious actor can’t just rehash the whole chain, because any check against the external service would fail.
  • WORM storage (write once, ready many). You could send your audit logs almost directly to WORM storage, where it’s impossible to replace data. However, that is not ideal, as WORM storage can be slow and expensive. For example AWS Glacier has rather big retrieval times and searching through recent data makes it impractical. It’s actually cheaper than S3, for example, and you can have expiration policies. But having to support your own WORM storage is expensive. It is a good idea to eventually send the logs to WORM storage, but “fresh” audit trail should probably not be “archived” so that it’s searchable and some actionable insight can be gained from it.
  • All-in-one – applying all of the above “just in case” may be unnecessary for every project out there, but that’s what I decided to do at LogSentinel. Business-event granularity with timestamping, hash chaining, anchoring, and eventually putting to WORM storage – I think that provides both security guarantees and flexibility.

I hope the overview is useful and the results from the survey shed some light on how this aspect of information security is underestimated.

The post Audit Trail Overview appeared first on Bozho's tech blog.