Tag Archives: Opinions

The Personal Data Store Pattern

Post Syndicated from Bozho original https://techblog.bozho.net/the-personal-data-store-pattern/

With the recent trend towards data protection and privacy, as well as the requirements of data protection regulations like GDPR and CCPA, some organizations are trying to reorganize their personal data so that it has a higher level of protection.

One path that I’ve seen organizations take is to apply the (what I call) “Personal data store” pattern. That is, to extract all personal data from existing systems and store it in a single place, where it’s accessible via APIs (or in some cases directly through the database). The personal data store is well guarded, audited, has proper audit trail and anomaly detection, and offers privacy-preserving features.

It makes sense to focus one’s data protection efforts predominantly in one place rather than scatter it across dozens of systems. Of course it’s far from trivial to migrate so much data from legacy systems to a new module and then upgrade them to still be able to request and use it when needed. That’s why in some cases the pattern is applied only to sensitive data – medical, biometric, credit cards, etc.

For the sake of completeness, there’s something else called “personal data stores” and it means an architecture where the users themselves store their own data in order to be in control. While this is nice in theory, in practice very few users have the capacity to do so, and while I admire the Solid project, for example, I don’t think it is viable pattern for many organizations, as in many cases users don’t directly interact with the company, but the company still processes large amounts of their personal data.

So, the personal data store pattern is an architectural approach to personal data protection. It can be implemented as a “personal data microservice”, with CRUD operations on predefined data entities, an external service can be used (e.g. SentinelDB, a project of mine), or it can just be a centralized database that has some proxy in front of it to control the access patterns. You an imagine it as externalizing your application’s “users” table and its related tables.

It sounds a little bit like a data warehouse for personal data, but the major difference is that it’s used for operational data, rather than (just) analysis and reporting. All (or most) of your other applications/microservices interact constantly with the personal data store whenever they need to access or update (or “forget”) personal data.

Some of the main features of such a personal data store, the combination of which protect against data breaches, in my view, include:

  • Easy to use interface (e.g. RESTful web services or simply SQL) – systems that integrate with the personal data store should be built in a way that a simple DAO layer implementation gets swapped and then data that was previously accessed form a local database is now obtained from the personal data store. This is not always easy, as ORM technologies add a layer of complexity.
  • High level of general security – servers protected with 2FA, access control, segregated networks, restricted physical access, firewalls, intrusion prevention systems, etc. The good things is that it’s easier to apply all the best practices applied to a single system instead of applying it (and keeping it that way) to every system.
  • Encryption – but not just “data at rest” encryption; especially sensitive data can and should be encrypted with well protected and rotated keys. That way the “honest but curious” admin won’t be able to extract anything form the underlying database
  • Audit trail – all infosec and data protection standards and regulations focus on accountability and traceability. There should not be a way to extract or modify personal data without leaving a trace (and ideally, that trace should be protected as well)
  • Anomaly detection – checking if there is something strange/anomalous in the data access patterns. Such strange access patterns can mean a data breach is happening, and the personal data store can actively block it. There is a lot of software out there that does anomaly detection on network traffic, but it’s much better if the rules (or machine learning) are domain-specific. “Monitor for increased traffic to those servers” is one thing, but it’s much better to be able to say “monitor for out-of-the ordinary accesses to personal data of such and such kind”
  • Pseudonymization – many systems that need the personal data don’t actually need to know who it is about. That includes marketing, including outsourcing to 3rd parties, reporting functionalities, etc. So the personal data store can return data that does not allow a person do be identified, but a pseudo-ID instead. That way, when updates are made back to the personal data store, they can still refer to a particular person, via the pseudonymous ID, but the application that extracted the data in the first place doesn’t get to know who the data was about. This is useful in scenarios where data has to be (temporarily or not) stored in a database that lies outside the personal datastore.
  • Authentication – if the company offers user authentication, this can be done via the personal data store. Passwords, two-factor authentication secrets and other means of authentication are personal data, and an important one as well. An organization may use a single-sign-on internally (e.g. Active Directory), but it doesn’t make sense to put customers there, too, so they are usually stored in a database. During authentication, the personal data store accepts all necessary credentials (username, password, 2FA code), and return a token to be used for subsequent calls or to be used a a session cookie token.
  • GDPR (or CCPA or similar) functionalities – e.g. export of all data about a person, forgetting a person. That’s an often overlooked problem, but “give me all data about me that you have” is an enormous issue with large companies that have dozens of systems. It’s next to impossible to extract the data in a sensible way from all the systems. Tracking all these requests is itself a requirement, so the personal data store can keep track of them to present to auditors if needed.

That’s all easier said than done. In organizations that have already many systems working alongside and processing personal data, migration can be costly. So it’s a good idea to introduce it as early as possible, and have a plan (even if it lasts for years) to move at least sensitive personal data to the well protected silo. This silo is a data engineering effort, a system refactoring effort and an organizational effort. The benefits, though, are reduced long-term cost and reduced risks for data breaches and non-compliance.

The post The Personal Data Store Pattern appeared first on Bozho's tech blog.

Cybersecurity Is Very Important

Post Syndicated from Bozho original https://techblog.bozho.net/cybersecurity-is-very-important/

A few months ago an essay titled “Cybersecurity is not very important” appeared. The essay is well written and interesting but I’d like to argue against its main point.

And that is actually hard – the essay has many good points, and although it has a contrarian feel, it actually isn’t saying anything outrageous. But I still don’t agree with the conclusion. I suggest reading it (or skimming it) first before continuing here, although this article is generally self-sufficient.

I agree with many things in the essay, most importantly that there is no 100% protection and it’s all about minimizing the risk. I also agree that cybersecurity is a complex set of measures that span not only the digital world, but he physical one as well. And I agree that even though after watching a few videos from DEF CON, BlackHat or CCC, one feels that everything is fundamentally broken and going to live in the mountains is the only sane strategy to survive an impending digital apocalypse, this is not the case – we have a somewhat okayish level of protection for the more important parts of the digital world. Certainly exploitable, but not trivially so.

There are, though, a few main claims that I’d like to address:

  • There has not been any catastrophic cybersecurity event – the author claims that the fact that there was no digital Pearl Harbor or 9/11 suggests that we’ve been investing just the right amount of effort in cybersecurity. I don’t think that’s a fair comparison. Catastrophic events like that cost human lives as an immediate result of a physical action. No digital event can cause immediate loss of human life. However, it can cause indirect loss of human life, and it probably has already – take a famous data breach in an extramarital affair dating site – do we know how much people were killed in Pakistan or Saudi Arabia because infidelity (or homosexuality) was exposed? How many people died because hospitals were victims of ransomware? How many people died when the Ukranian power grid was attacked, leaving 20% of of Kyiv without power and therefore without heat, light or emergency care? What about the (luckily unsuccessful) attempt to sabotage a Saudi Arabia petro-chemical plant and cause an explosion? There are many more of these events, and they are already a reality. There are no visible explosions yet, which would make it easier to compare them to Pearl Harbor or 9/11, but they are serious and deadly nonetheless. And while natural disasters, road incidents and other issues claim more victims, there isn’t a trivial way to calculate the “return of life on investment”. And isn’t a secure charity for improving hurricane protection in third world nations better than one that gets hacked and all of its funds get stolen?
  • People have not adopted easy security measures because they were minor inconveniences – for example 2-factor authentication has been around for ages, but only recently we began using it. That is true, of course, but the reason for that might not be that it has been mostly fine to not have 2FA so far, but that society hasn’t yet realized the risks. Humans are very bad at intuitively judging risk, especially when they don’t have enough information. Now that we have more information, we are slightly better at estimating that, yes, adding a second factor is important for some systems. Security measures get adopted when we realize the risk, not only when there is more of it. Another reason people have not adopted cybersecurity measures is that they don’t know about them. Because the area is relatively recent, expertise is rare. This discrepancy between the ubiquity of information technology and the lacks of technical expertise (not to mention security expertise) has been an issue for a long time.
  • The digital world plays too small a role in our world when we put things in perspective – humans play a small role in the world if you put them in a big enough perspective, that doesn’t mean we are not important. And the digital world is playing an increasingly important role in our world – we can’t that easily continue to claim that cybersecurity is not important. And overall, the claim that so far everything has been (almost) smooth sailing can’t be transformed into the argument that it is going to be the same, only with gradual improvement over time. If IT is playing an exponentially more important role (and it is), then our focus on information security can’t grow linearly. I know you can’t plot these things on a graph without looking stupid, but you get the gist.
  • We have managed to muddle through without too much focus on cybersecurity – yes, we have. But we will find it increasingly harder to do so. Also, we have successfully muddled through many eras of human history because we have done things wrong (For example the Maya civilization collapsed partly because they handled the the environment wrong). Generally, the fact that something hasn’t gone terribly wrong is a bad argument that we are doing fine. Systemic issues get even more entrenched while on the surface it may look like we are successfully muddling through. I’m not saying that is certainly the case for cybersecurity, but it might very well be.

While arguing with the author’s point is an interesting task, it doesn’t directly prove the point that cybersecurity is indeed important.

First, we don’t have good comparisons of estimates of the cost – to the economy and to human life – of investment in cybersecurity as opposed to other areas, so I don’t think we can claim cybersecurity is not important. There are, for example, estimates of the cost of a data breach, and it averages several million dollars. If you directly and indirectly lose several million dollars with a likelihood of 30% (according to multiple reports), I guess you should invest a few hundred thousands.

Second, it is harder to internalize the risk of incidents in the digital world compared to those in the physical world. While generally bad at evaluating risk, I think the indirection that the digital world brings, contributes negatively to our ability to make risk-based decisions. The complexity of the software complicates things even further – even technical people can’t always imagine the whole complexity of the systems they are working with. So we may not feel cybersecurity is important even though facts and figures show otherwise.

But for me the most important reason for the importance of cybersecurity is that we are currently laying a shaky foundation for our future world. Legacy software, legacy protocols and legacy standards are extremely hard to get rid of once they are ubiquitous. And if they are insecure by design, because they are not built with security in mind, there is no way that software that relies on them can be secure.

If we don’t get cybersecurity right soon, everything that relies on the foundations that we build today will be broken. And no, you can’t simply replace your current set of systems with new, more secure ones. Organizations are stuck with old systems not because they don’t want to get new and better ones, but because it’s hard to do that – it involves migration, user training, making sure all edge cases are covered, informing customers, etc. Protocols and standards are even hard to change – see how long it took for TLS 1.3 to come along, for example. But network standards still have vulnerabilities that don’t have good mitigation (or didn’t have until recently) – take an SS7 attack on a mobile network, or ARP spoofing, or BGP hijacking.

If we don’t agree that cybersecurity is very important, future technology will be based on an insecure layer that it will try to fix with clumsy abstractions. And then at some point everything may collapse, at a moment when we are so dependent on it, that the collapse will be a major disruption in he way humanity operates. That may sound futuristic, but with technology you have no option but to be futuristic. We must build systems today that will withstand the test of time. And this is already very hard – maybe because we didn’t think cybersecurity is important enough.

I’m not saying we should pour millions into cybersecurity starting tomorrow. But I’d be happy to see a security mindset in everyone that works with technology as well as in everyone that takes decisions that involve technology. Not paranoid, but security conscious. Not “100% secure or bust”, but taking all known protection measures.

Cybersecurity is important. And it will be even more important in he upcoming decades.

The post Cybersecurity Is Very Important appeared first on Bozho's tech blog.

Protecting JavaScript Files (From Magecart-Style Attacks)

Post Syndicated from Bozho original https://techblog.bozho.net/protecting-javascript-files-from-magecart-attacks/

Most web pages now consist of multiple JavaScript files that are included in various ways (via >script< or in some more dynamic fashion, bundled and minified or not). But since these scripts interact with everything on the page, they can be a security risk.

And Magecart showcased that risk – the group attacked multiple websites, including British Airways and Ticketmaster, and stole a few hundred thousand credit card numbers.

It is a simple attack where attacker inserts a malicious javascript snippet into a trusted javascript file, collects credit card details entered into payment forms and sends them to an attacker-owned website. Obviously, the easy part is writing the malicious javascript; the hard part is getting it on the target website.

Many websites rely on externally hosted assets (including scripts) – be it a CDN, or a dedicated asset server (as in the case of British Airways). These externally hosted assets may be vulnerable in several ways:

  • Asset servers may be less protected than the actual server, because they are just static assets, what could go wrong?
  • Credentials to access CDN configuration may be leaked which can lead to an attacker replacing the original source scripts with their own
  • Man-in-the-middle attacks are possible if the asset server is misconfigured (e.g. allowing TLS downgrade attack)
  • The external service (e.g. CND) that was previously trusted can go rogue – that’s unlikely with big providers, but smaller and cheaper ones are less predictable

Once the attackers have replaced the script, they are silently collecting data until they are caught. And this can be a long time.

So how to protect against those attacks? A typical advice is to introduce a content security policy, which will allow scripts from untrusted domains to be executed. This is a good idea, but doesn’t help in the scenario where a trusted domain is compromised. There are several main approaches, and I’ll summarize them below:

  • Subresource integrity – this is a browser feature that lets you specify the hash of a script file and validates that hash when the page loads. If it doesn’t match the hash of the actually loaded script, the script is blocked. This sounds great, but has several practical implications. First, it means you need to complicate your build pipeline so that it calculates the hashes of minified and bundled resources and inject those hashes in the page templates. It’s a tedious process, but it’s doable. Then there are the dynamically loaded scripts where you can’t use this feature, and there are the browsers that don’t support it fully (Edge, IE and Safari on mobile). And finally, if you don’t have a good build pipeline (which many small websites don’t), a very small legitimate change in the script can break your entire website.
  • Don’t use external services – that sounds straightforward but it isn’t always. CDNs exist for a reason and optimize your site loading speeds and therefore ranking, internal policies may require using a dedicated asset server, sometimes plugins (e.g. for WordPress) may fetch external resources. An exception to this rule is allowed if you somehow sandbox the third party script (e.g. via iframe as explained in the link above)
  • Secure all external servers properly – if you can do that, that’s great – upgrade the supported cipher suites, monitor for 0days, use only highly trusted CDNs. Regardless of anything, you should obviously always strive to do that. But it requires expertise and resources, which may not be available to every company and every team.

There is one more scenario that may sound strange – if an attacker hacks into your main application server(s), they can replace the scripts with whatever they want. It sounds strange at first, because if they have access to the server, it’s game over anyway. But it’s not always full access with RCE – might be a limited access. Credit card numbers are usually not stored in plain text in the database, so having access to the application server may not mean you have access to the credit card numbers. And changing the custom backend code to collect the data is much more unpredictable and time-consuming than just replacing the scripts with malicious ones. None of the options above protect against that (as in this case the attacker may be able to change the expected hash for the subresource integrity check)

Because of the limitations of the above approaches, at my company we decided to provide a tool to monitor your website for such attacks. It’s called Scriptinel.com (short for Script Sentinel) and is currently in early beta. It’s mainly targeted at small website owners who can’t get any of the above 3 points, but can be used for sophisticated websites as well.

What it does is straightforward – it scans a given URL, extracts all scripts from it (even the dynamic ones), and starts monitoring them for changes with periodic requests. If it discovers a change, it notifies the website owner so that they can react.

This means that the attacker may have a few minutes to collect data, but time is an important factor here – this is not a “SELECT *” data breach; it relies on customers using the website. So a few minutes minimizes the damage. And it doesn’t break your website (I guess we can have a script to include that blocks the page if scriptinel has found discrepancies). It also doesn’t require changes in the build process to include hashes. Of course, such a reactive approach is not perfect, especially if there is nobody to react, but monitoring is a good idea regardless of whether other approaches are used.

There is the issue of protected pages and pages that are not directly accessible via a GET request – e.g. a payment page. For that you can enter your javascript files individually, rather than having the tool scan the page. We can add a more sophisticated user journey scan, with specifying credentials and steps to reach the protected pages, but for now that seems unnecessary.

How does it solve the “main server compromised” problem? Well, nothing solves that perfectly, as the attacker can do changes that serve the legitimate version of the script to your monitoring servers (identifying them by IP) and the modified scripts to everyone else. This can be done on the compromised external asset servers as well (though not with leaked CDN credentials). However this implies the attacker knows that Scriptinel is used, knows the IP addresses of our scanners, and has gained sufficient control to server different versions based on IP. This raises the bar significantly, and can even be made impossible to pull off if we regularly change the IP addresses in a significantly large IP range.

Such functionality may be available in some enterprise security suites, though I’m not aware of it (if it exists somewhere, please let me know).

Overall, the problem is niche, but tough, and not solving it can lead to serious data breaches even if your database is perfectly protected. Scriptinel is a simple to use, good enough solution (and one that’s arguably better than the other options).

Good information security is the right combination of knowledge, implementation of best practices and tools to help you with that. And I maybe Scriptinel is one such tool.

The post Protecting JavaScript Files (From Magecart-Style Attacks) appeared first on Bozho's tech blog.

Let’s Annotate Our Methods With The Features They Implement

Post Syndicated from Bozho original https://techblog.bozho.net/lets-annotate-our-methods-with-the-features-they-implement/

Writing software consists of very little actual “writing”, and much more thinking, designing, reading, “digging”, analyzing, debugging, refactoring, aligning and meeting others.

The reading and digging part is where you try to understand what has been implemented before, why it has been implemented, and how it works. In larger projects it becomes increasingly hard to find what is happening and why – there are so many classes that interfere, and so many methods participate in implementing a particular feature.

That’s probably because there is a mismatch between the programming units (classes, methods) and the business logic units (features). Product owners want a “password reset” feature, and they don’t care if it’s done using framework configuration, custom code split in three classes, or one monolithic controller method that does that job.

This mismatch is partially addressed by the so called BDD (behaviour driven development), as business people can define scenarios in a formalized language (although they rarely do, it’s still up to the QAs or developers to write the tests). But having your tests organized around features and behaviours doesn’t mean the code is, and BDD doesn’t help in making your way through the codebase in search of why and how something is implemented.

Another issue is linking a piece of code to the issue tracking system. Source control conventions and hooks allow for setting the issue tracker number as part of the commit, and then when browsing the code, you can annotate the file and see the issue number. However, due the the many changes, even a very strict team will end up methods that are related to multiple issues and you can’t easily tell which is the proper one.

Yet another issue with the lack of a “feature” unit in programming languages is that you can’t trivially reuse existing projects to start a new one. We’ve all been there – you have a similar project and you want to get a skeleton to get thing running faster. And while there are many tools to help that (Spring Boot, Spring Roo, and other scaffolding utilities), they can rarely deliver what you need – you always have to tweak something, delete something, customize some configuration, as defaults are almost never practical.

And I have a simple proposal that will help with the issues above. As with any complex problem, simple ideas don’t solve everything, but are at least a step forward.

The proposal is in the title – let’s annotate our methods with the features they implement. Let’s have @Feature(name = "Forgotten password", issueTrackerCode="PROJ-123"). A method can implement multiple features, but that is generally discouraged by best practices (e.g. the single responsibility principle). The granularity of “feature” is something that has to be determined by each team and is the tricky part – sometimes an epic describes a feature, sometimes individual stories or even subtasks do. A definition of a feature should be agreed upon and every new team member should be told what to do and how to interpret it.

There is of course a lot of complexity, e.g. for generic methods like DAO methods, utility methods, or methods that are reused in too many places. But they also represent features, it’s just that these features are horizontal. “Data access layer” is a feature – a more technical one indeed, but it counts, and maybe deserves a story in the issue tracker.

Your features can actually be listed in one or several enums, grouped by type – business, horizontal, performance, etc. That way you can even compose features – e.g. account creation contains the logic itself, database access, a security layer.

How does such a proposal help?

  • Consciousnesses about the single responsibility of methods and that code should be readable
  • Provides a rationale for the existence of each method. Even if a proper comment is missing, the annotation will put a method (or a class) in context
  • Helps navigating code and fixing issues (if you can see all places where a feature is implemented, you are more likely to spot an issue)
  • Allows tools to analyze your features – amount, complexity, how chaotic a feature is spread across the code base, test coverage per feature, etc.
  • Allows tools to use existing projects for scaffolding for new ones – you specify the features you want to have, and they are automatically copied

At this point I’m supposed to give a link to a GitHub project for a feature annotation library. But it doesn’t make sense to have a single-annotation project. It can easily be part of guava or something similar Or can be manually created in each project. The complex part – the tools that will do the scanning and analysis, deserve separate projects, but unfortunately I don’t have time to write one.

But even without the tools, the concept of annotating methods with their high-level features is I think a useful one. Instead of trying to deduce why is this method here and what requirements does it have to implement (and were all necessary tests written at the time), such an annotation can come handy.

The post Let’s Annotate Our Methods With The Features They Implement appeared first on Bozho's tech blog.

Idea: A Generic P2P Network Client

Post Syndicated from Bozho original https://techblog.bozho.net/idea-a-generic-p2p-network-client/

Every now and then one has a half-baked idea about some project that they aren’t likely to be able to do because of lack of time. I’ve written about such random app ideas before, but they were mostly about small apps.

Here I’d like to share an idea for something a bit bigger (and therefor harder to spare time for) – a generic P2P network client. P2P networks are popular in various domains, most notably file sharing and cryptocurrencies. However, in theory they can be applied to many more problems, social networks, search engines, ride sharing, distributed AI, etc. All of these examples have been implemented in p2p context, and they even work okay, but they lack popularity.

The popularity is actually the biggest issue with these applications – in order to get a service to be popular, in many cases you need a network effect – a p2p file sharing with 100 users doesn’t benefit from being p2p. A social network with 100 users is, well, useless. And it is hard to get traction with these p2p services because they require an additional step – installing software. You can’t just open a webpage and register, you have to install some custom software that will be used to join the p2p network.

P2P networks are distributed, i.e. there is no central node that has control over what happens. That control is held over the binary that gets installed – and it’s usually open source. And you need that binary in order to establish an overlay network. These networks reuse the internet’s transport layer, but do not rely on the world wide web standards, and most importantly, don’t rely heavily on DNS (except, they actually do when run for the first time in order to find a few known seed nodes). So once you are connected to the network, you don’t need to make HTTP or DNS queries, everything stays in the specifics of the particular protocol (e.g. bittorrent).

But the fact that not only you have to trust and install some piece of software, you have to be part of the network and exchange data regularly with peers. So it really doesn’t scale if you want to be part of dozens of p2p networks – each of them might be hungry for resources, you have to keep dozens of applications running all the time (and launching on startup).

So here’s the idea – why don’t we have a generic p2p client. A software that establishes the link to other peers and is agnostic on what data is going to be transferred. From what I’ve seen, the p2p layer is pretty similar in different products – you try to find peers in your immediate network, if none are found, you connect to a known seed node (first by DNS which uses DNS round-robin and then by a list of hardcoded IPs), and when you connect the seed node gives you a list of peers to connect to. Each of those peers has knowledge of other peers, so you can quickly be connected to a significant number of peer nodes (I’m obviously simplifying the flow, but that’s roughly how it works).

Once you have an established list of peers, you start doing the application-specific stuff – sharing files, downloading a cryptocurrency ledger, sharing search indexes, sharing a social network profile database, etc. But the p2p network part can be, I think, generalized.

So we can have that generic client and make it pluggable – every application developer can write their own application ontop of it. The client will not only be a single point for the user, but can also manage resources automatically – inbound and outbound traffic, CPU/GPU usage (e.g. in case of cryptocurrency mining). And all of these applications (i.e. plugins) can either be installed by downloading them from the vendor’s website (which is somewhat similar to the original problem), or can be downloaded from a marketplace available within the client itself. That would obviously mean a centralized marketplace, unless the marketplace itself is a p2p application that’s built-in the client.

So, effectively, you’d be able to plug-in your next file sharing solution, you next cryptocurrency, encrypted messaging, or your next distributed social network. And your users won’t have the barrier of installing yet another desktop app. That alone does not solve the network effect, as you still need enough users to add your plugin to their client (and for many to have the client to begin with), but it certainly makes it easier for them. Imagine if we didn’t have Android and Apple app stores and we had to find relevant apps by other means.

This generic client can possibly be even a browser plugin, so that it’s always on when you are online. It doesn’t have to be, but it might ease adoption. And while it will be complicated to write it as a plugin, it’s certainly possible – there are already p2p solutions working as browser plugins.

Actually, many people think of blockchain smart contracts as a way to do exactly that – have distributed applications. But they have an unnecessary limitation – they work on data that’s shared on a blockchain. And in some cases you don’t need that. You don’t need consensus in the cryptocurrency sense. For example in file sharing, all you need to do is compute the hash of the file (and of its chunks) and start sending them to interested peers. No need to store the file on the blockchain. Same with instant messaging – you don’t need to store the message on a shared immutable database, you only need to send it to the recipients. So smart contracts are not as generic solution as what I’m proposing.

Whether a generic client can accommodate an unlimited amount of different protocols and use cases, how would the communication protocol look like, what programming languages it should support and what security implications that has for the client (e.g. what’s the sandbox that the client provides), what UI markup will be used, are all important operational details, but are besides the point of this post.

You may wonder whether there isn’t anything similar done already. Well, I couldn’t find one. But there’s a lot done that can support such a project: Telehash (a mesh protocol), JXTA (a p2p protocol) and its Chaupal implementation, libp2p and Chimera (p2p libraries), Kademlia (a distributed hash table).

Whether such a project is feasible – certainly. Whether its adoption is inevitable – not so certainly, as it would require immediate usefulness in order to be installed in the first place. So, as every “platform” it will face a chicken-and-egg problem – will people install it if there are no useful plugins, and will people write plugins if there are no user installations. That is solvable in a number of ways (e.g. paying developers initially to write plugins, bundling some standard applications (e.g. file sharing and instant messaging). It could be a business opportunity (monetized through the marketplace or subscriptions) as well as a community project.

I’m just sharing the idea, hoping that someone with more time and more knowledge of distributed networks might pick it up and make the best of it. If not, well, it’s always nice to think about what can the future of the internet look like. Centralization is inevitable, so I don’t see p2p getting rid of centralized services anytime soon (or ever), but some things arguably work better and safer when truly decentralized.

The post Idea: A Generic P2P Network Client appeared first on Bozho's tech blog.

First Thoughts About Facebook’s Libra Cryptocurrency

Post Syndicated from Bozho original https://techblog.bozho.net/first-thoughts-about-facebooks-libra-cryptocurrency/

Facebook announced today that by 2020 they will roll out Libra – their blockchain-based cryptocurrency. It is, of course, major news, as it has the potential to disrupt the payment and banking sector. If you want to read all the surrounding newsworthy details, you can read the TechCrunch article. I will instead focus on a few observations and thoughts about Libra – from a few perspectives – technical, legal/compliance, and possibly financial.

First, replacing banks and bank transfers and credits cards and payment providers and ATMs with just your smartphone sounds appealing. Why hasn’t anyone tried to do that so far – well, many have tried, but you can’t just have the technology and move towards gradual adoption. You can’t even do it if you are Facebook. You can, however, do it, if you are Facebook, backed by Visa, Mastercard, Uber, and many, many more big names on the market. So Facebook got that right – they made a huge coalition that can drive such a drastic change forward.

I have several reservations, though. And I’ll go through them one by one.

  • There is not much completed – there is a website and a technical paper and an open-source prototype. It’s not anywhere near production. The authors of the paper write that the Move language is still being designed (and it’s not that existing move language). The opensource prototype is still a prototype (and one that’s a bit hard to read, thought that might be because of the choice of Rust). This means they will work with tight schedules to make this global payment system operational. The technical paper is a bit confusing hard to follow. Maybe they rushed it out and didn’t have time to polish it, and maybe it’s just a beginning of a series of papers. Or maybe it’s my headache and the paper is fine.
  • The throughput sounds insufficient – the authors of the paper claim that they don’t have any performance results yet, but they expect to support around 1000 transactions per second. This section of the paper assumes that state channels will be used and most transactions won’t happen on-chain, but that’s a questionable assumption, as the use of state channels is not universally applicable. And 1000 TPS is too low compared to the current Visa + mastercard transactions (estimated to be around 5000 per second). Add bank transfers, western union, payment providers and you get a much higher number. If from the start you are presumably capped at 1000, would it really scale to become a global payment network? Optimizing throughput is not something that just happens – other cryptocurrencies have struggled with that for years.
  • Single merkle tree – I’m curious whether that will work well in practice. A global merkle tree that is being concurrently and deterministically updated is not an easy task (ordering matters, timestamps are tricky). Current blockchain implementations use one merkle tree per block and it’s being constructed based on all transactions that fall within that block when the block is “completed”. In Libra there will be no blocks (so what does “blockchain” mean anyway), so the merkle tree will always grow (similar to the certificate transparency log).
  • How to get money in? – Facebook advertises Libra as a way for people without bank accounts to do payments online. But it’s not clear how they will get Libra tokens in the first place. In third world countries, consumers’ interaction is with the telecoms where they get their cheap smartphones and mobile data cards. The realistic way for people to get Libra tokens would be if Facebook had contracts with Telecoms, but how and whether that is going to be arranged, is an open question.
  • Key management – key management has always been, and will probably always be a usability issue for using blockchain by end users. Users don’t want to to manage their keys, and they can’t do that well. Yes, there are key management services and there are seed phrases, but that’s still not as good as getting a centralized account that can be restored if you lose your password. And if you lose your keys, or your password (from which a key is derived to encrypt your keys and store them in the cloud), then you can lose access to your account forever. That’s a real risk and it’s highly undesirable for a payment system
  • Privacy – Facebook claims it won’t associate Libra accounts with facebook accounts in order to target ads. And even if we can trust them, there’s an inherent privacy concern – all payments will be visible to anyone who has access to the ledger. Yes, a person can have multiple accounts, but people will likely only use one. It’s not a matter of what can be done by tech savvy users, but what will be done in reality. And ultimately, someone will have to know who those addresses (public keys) belong to (see the next point). A zcash-basedh system would be privacy preserving, but it can’t be compliant
  • Regulatory compliance – if you are making a global payment system, you’d have to make regulators around the world, especially the SEC, ESMA, ECB. That doesn’t mean you can’t be distruptive, but you have to take into account real world problems, like KYC (know your customer) processes as well as other financial limitations (you can’t print money after all). Implementing KYC means people being able to prove who they are before they can open an account. This isn’t rocket science, but it isn’t trivial either, especially in third world countries. And money launderers are extremely inventive, so if Facebook (or the Libra association) doesn’t get its processes right, it risks becoming very lucrative for criminals and regime officials around the world. Facebook is continuously failing to stop fake news and disinformation on the social network despite the many urges from the public and legislators, I’m not sure we can trust them with being “a global compliance officer”. Yes, partners like Visa and Mastercard can probably help here, but I’m not sure how involved they’ll be.
  • Oligarchy – while it has “blockchain” in the tagline, it’s not a democratizing solution. It’s a private blockchain based on trusted entities that can participate in it. If it’s going to be a global payment network, it will be a de-facto payment oligarchy. How is that worse from what we have now? Well now at least banks are somewhat independent power players. The Libra association is one player with one goal. And then antitrust cases will follow (as they probably will immediately, as Facebook has long banned cryptocurrency ads on the website only to unveil its own)

Facebook has the power to make e-commerce website accept payments in Libra. But will consumers want it. Is it at least 10 times better than using your credit card, or is it marginally better so that nobody is bothered to learn yet another thing (and manage their keys). Will third world countries really be able benefit from such a payment system, or others will prove more practical (e.g. the m-pesa)? Will the technology be good enough or scalable enough? Will regulators allow it, or will Facebook be able to escape their graps with a few large fines until they’ve conquered the market?

I don’t know the answers, but I’m skeptical about changing the money industry that easily. And given Facebook’s many privacy blunders, I’m not sure they will be able to cope well in a much more scrutinized domain. “Move fast and break things” and easily become “Move fast and sponsor terrorism”, and that’s only partly a joke.

I hope we get fast and easy digital payments, as bank transfers and even credit cards feel too outdated. But I’m sure it’s not easy to meet the complex requirements of reality.

The post First Thoughts About Facebook’s Libra Cryptocurrency appeared first on Bozho's tech blog.

Reflection is the most important Java API

Post Syndicated from Bozho original https://techblog.bozho.net/reflection-is-the-most-important-java-api/

The other day I was wondering – which is the most important Java API. Which of the SE and EE APIs is the one that makes most of the Java ecosystem possible and that could not have just been recreated as a 3rd party library.

And as you’ve probably guessed by the title – I think it’s the Reflection API. Yes, it’s inevitably part of every project, directly or indirectly. But that’s true for many more APIs, notably the Collection API. But what’s important about the Reflection API is that it enabled most of the popular tools and frameworks today – Spring, Hibernate, a ton of web frameworks.

Most of the other APIs can be implemented outside of the JDK. The Collections API could very well be commons-collect or guava. It’s better that it’s part of the JDK, but we could’ve managed without it (it appeared in Java 1.2). But the reflection API couldn’t. It had to almost be an integral part of the language.

Without reflection, you could not have any of the fancy tools that we use today. Not ORM, not dependency injection frameworks, and not most of the web frameworks. Well, technically, you could at some point have theme – using SPI, or using java-config only. One may argue that if it wasn’t for reflection, we’d have skipped the whole XML configuration era, and wend directly to code-based configuration. But it’s not just configuration that relies on reflection in all these frameworks. Even if Spring could have its beans instantiated during configuration and initialized by casting them to InitalizingBean, how would you handle the autowired injection without reflection (“manually” doesn’t count, as it’s not autowiring)? In hibernate, the introspection and java bean APIs might seem sufficient, but when you dig deeper, they are not. And handling annotations would not be possible in general.

And without these frameworks, Java would not have been the widespread technology that it is today. If we did not have the huge open-source ecosystem, Java would have been rather niche [citation needed]. Of course, that’s not the only factor – there are multiple things that the language designers and then the JVM implementors got right. But reflection is, I think, one of those things.

Yes, using reflection feels hacky. Reflection in the non-framework code feels like a last resort thing – you only use it if a given library was not properly designed for extension but you need to tweak it a tiny bit to fit your case. But even if you have zero reflection code in your codebase, your project is likely full of it and would not have been possible without it.

The need to use reflection might be seen as one of the deficiencies of the language – you can’t do important stuff with what the language gives you, so you resort to a magic API that gives you unrestricted access to otherwise (allegedly) carefully designed APIs. But I’d say that even having reflection is a de-facto language feature. And it is one that probably played a key role in making Java so popular and widespread.

The post Reflection is the most important Java API appeared first on Bozho's tech blog.

The Positive Side-Effects of Blockchain

Post Syndicated from Bozho original https://techblog.bozho.net/the-positive-side-effects-of-blockchain/

Blockchain is a relatively niche technology at the moment, and even thought there’s a lot of hype, its applicability is limited. I’ve been skeptical about its ability to solve all the world’s problems, as many claim, and would rather focus it on solving particular business issues related to trust.

But I’ve been thinking about the positive side-effects and it might actually be one of the best things that have happened to software recently. I don’t like big claims and this sound like one, but bear with me.

Maybe it won’t find its place in much of the business software out there. Maybe in many cases you don’t need a distributed solution because the business case does not lend itself to one. And certainly you won’t be trading virtual coins in unregulated exchanges.

But because of the hype, now everyone knows the basic concepts and building blocks of blockchain. And they are cryptographic – they are hashes, digital signatures, timestamps, merkle trees, hash chains. Every technical and non-technical person in IT has by now at least read a little bit about blockchain to understand what it is.

So as a side effect, most developers and managers are now trust-conscious, and by extension – security conscious. I know it may sound far-fetched, but before blockchain how many developers and managers knew what a digital signature is? Hashes were somewhat more prevalent mostly because of their (sometimes incorrect) use to store passwords, but the PKI was mostly arcane knowledge.

And yes, we all know how TLS certificates work (although, do we?) and that a private key has to be created and used with them, and probably some had a theoretical understanding of digital signatures. And we knew encryption was kind of a good idea at rest and in transit. But putting that in the context of “trust”, “verifiability” and “non-repudiation” was, in my view, something that few people have done mentally.

And now, even by not using blockchain, developers and managers would have the trust concept lurking somewhere in the back of their mind. And my guess would be that more signatures, more hashes and more trusted timestamps will be used just because someone thought “hey, we can make this less prone to manipulation through this cool cryptography that I was reminded about because of blockchain”.

Blockchain won’t be the new internet, but it already has impact on the mode of thinking of people in the software industry. Or at least I hope so.

The post The Positive Side-Effects of Blockchain appeared first on Bozho's tech blog.

Audit Trail in IT Context

Post Syndicated from Bozho original https://techblog.bozho.net/audit-trail-in-it-context/

An audit trail (or audit log) is something both intuitive and misleading at the same time. There are many definitions of an audit trail, and all of them give you an idea of what it is about:

A system that traces the detailed transactions relating to any item in an accounting record.

A record of the changes that have been made to a database or file.

An audit trail (also called audit log) is a security-relevant chronological record, set of records, and/or destination and source of records that provide documentary evidence of the sequence of activities that have affected at any time a specific operation, procedure, or event.

An audit trail is a step-by-step record by which accounting or trade data can be traced to its source

The definitions are clear, but they rarely give enough detail on how they apply to a particular IT setup. The Wikipedia article is pretty informative, but I prefer referring to the NIST document about audit trail. This relatively short document from more than 20 years ago covers many the necessary details.

I won’t repeat the NIST document, but I’d like to focus on the practical aspects of an audit trail in order to answer the question in the title – what is an audit trail? In the context of a typical IT setup, the audit trail includes all or some of the following:

  • Application-specific audit trail – ideally, each application records business-relevant events. They may be logged in text files or in separate database tables. They allow reconstructing the history much better than the arbitrary noisy logging that is usually in place
  • Application logs – this is a broader category as it includes logs that are not necessarily part of the audit trail (e.g. debug messages, exception stacktraces). Nevertheless, they may be useful, especially in case there is no dedicated application-specific audit trail functionality
  • Database logs – whether it is logged queries, change data capture or change tracking functionality, or some native audit trail functionality
  • Operating system logs – for Linux that would include the /var/log/audit/audit.log (or similar files), /var/log/auth.log. For Windows it would include the Windows Event logs for the Security and System groups.
  • Access logs – access logs for web servers can be part of the audit trail especially for internal systems where a source IP address can more easily be mapped to particular users.
  • Network logs – network equipment (routers, firewalls) generate a lot of data that may be seen as part of the audit trail (although it may be very noisy)

All of these events can (and should) be collected in a central place where they can be searched, correlated and analyzed.

Ideally, an audit trail event has an actor, an action, details/payload and optionally an entity (type + id) which is being accessed or modified. Having this information in a structured form allows for better analysis and forensics later.

Once the audit trails is collected, it has two other very important properties:

  • Availability – is the audit trail available at all times, and how far back in time can it be accessed (also referred to as “retention”). Typically application, system and network logs are kept for shorter periods of time (1 to 3 months), which is far from ideal for an audit trail. Many standards and regulation require higher retention periods of up to 2 years
  • Integritydata integrity is too often ignored. But if you can’t prove the integrity of your logs, both internally and to third parties (auditors, courts), then they are of no use.

Many organizations understand that the integrity of their audit trail is important only after a security incident takes place and they realize they cannot rely on their audit logs. It is better to protect the audit trail before something bad happens. And not just for some abstract reasons like “best practices” or “improved security”. A secure audit trail allows organizations to:

  • Identify that something wrong has happened. If the audit trail is not protected, a malicious actor can make it seem like nothing anomalous has taken place
  • Prove what happened and who did it. Digital forensics is important for large organizations. Especially in cases where lawsuits are involved.
  • Reconstruct the original data. The audit trail is not a replacement for a full backup, but it can allow you to reconstruct the data that was modified and the time it was modified, so that either you know which backup to use, or to avoid restoring form the backup altogether by just relying on the data provided in the audit trail.
  • Be compliant. Because of all the reasons stated above, many standards and regulations require a secure audit trail. Simple logs are usually not compliant, even though auditors may turn a blind eye.

The audit trail is an important concept, and has very practical implications. It is a widely researched topic, but in practice many IT setups lack sufficient audit trail capabilities. Partly because the risks are not immediately obvious, and partly because it is a technically challenging task.

(The article is originally published on our corporate blog, but I think it can be interesting here as well, so I copied most of it, leaving out the corporate references and calls to action)

The post Audit Trail in IT Context appeared first on Bozho's tech blog.

7 Questions To Ask Yourself About Your Code

Post Syndicated from Bozho original https://techblog.bozho.net/7-questions-to-ask-yourself-about-your-code/

I was thinking the other days – why writing good code is so hard? Why the industry still hasn’t got to producing quality software, despite years of efforts, best practices, methodologies, tools. And the answer to those questions is anything but simple. It involves economic incentives, market realities, deadlines, formal education, industry standards, insufficient number of developers on the market, etc. etc.

As an organization, in order to produce quality software, you have to do a lot. Setup processes, get your recruitment right, be able to charge the overhead of quality to your customers, and actually care about that.

But even with all the measures taken, you can’t guarantee quality code. First, because that’s subjective, but second, because it always comes down to the individual developers. And not simply whether they are capable of writing quality software, but also whether they are actually doing it.

And as a developer, you may fit the process and still produce mediocre code. This is why my thoughts took me to the code from the eyes of the developer, but in the context of software as a whole. Tools can automatically catch code styles issues, cyclomatic complexity, large methods, too many method parameters, circular dependencies, etc. etc. But even if you cover those, you are still not guaranteed to have produced quality software.

So I came up with seven questions that we as developers should ask ourselves each time we commit code.

  1. Is it correct? – does the code implement the specification. If there is no clear specification, did you do a sufficient effort to find out the expected behaviour. And is that behaviour tested somehow – by automated tests preferably, or at least by manual testing,.
  2. Is it complete? – does it take care of all the edge cases, regardless of whether they are defined in the spec or not. Many edge cases are technical (broken connections, insufficient memory, changing interfaces, etc.).
  3. Is it secure? – does it prevent misuse, does it follow best security practices, does it validate its input, does it prevent injections, etc. Is it tested to prove that it is secure against these known attacks. Security is much more than code, but the code itself can introduce a lot of vulnerabilities.
  4. Is it readable and maintainable? – does it allow other people to easily read it, follow it and understand it? Does it have proper comments, describing how a certain piece of code fits into the big picture, does it break down code in small, readable units.
  5. Is it extensible? – does it allow being extended with additional use cases, does it use the appropriate design patterns that allow extensibility, is it parameterizable and configurable, does it allow writing new functionality without breaking old one, does it cover a sufficient percentage of the existing functionality with tests so that changes are not “scary”.
  6. Is it efficient? – does work well under high load, does it care about algorithmic complexity (without being prematurely optimized), does it use batch processing, does it read avoid loading big chunks of data in memory at once, does it make proper use of asynchronous processing.
  7. Is it something to be proud of? – does it represent every good practice that your experience has taught you? Not every piece of code is glorious, as most perform mundane tasks, but is the code something to be proud of or something you’d hope nobody sees? Would you be okay to put it on GitHub?

I think we can internalize those questions. Will asking them constantly make a difference? I think so. Would we magically get quality software If every developer asked themselves these questions about their code? Certainly not. But we’d have better code, when combined with existing tools, processes and practices.

Quality software depends on many factors, but developers are one of the most important ones. Bad software is too often our fault, and by asking ourselves the right questions, we can contribute to good software as well.

The post 7 Questions To Ask Yourself About Your Code appeared first on Bozho's tech blog.

Implicit _target=”blank”

Post Syndicated from Bozho original https://techblog.bozho.net/implicit-_targetblank/

The target="_blank" href attributes has been been the subject of many discussions. When is it right to use it, should we use it at all, is it actually deprecated, is it good user experience, does it break user expectations, etc.

And I have a strange proposal for improving the standard behaviour in browsers – implicit target=_blank" in certain contexts. But let’s try to list when target="_blank" is a good idea:

  • On pages with forms when the user may need additional information in order to fill the form but you don’t want them to leave the form and lose their input
  • On homepage-like websites – e.g. twitter and facebook, where your behaviour is “browsing” and opening a link here and there. It may be applied to things like reddit or hacker news, though it’s currently not implemented that way there
  • In comment/review sections where links are user-provided – this is similar to the previous one, as the default behaviour is browsing through multiple comments and possibly following some of them

The typical argument here is that if a user wants to open a new page, they can do that with either the context manu or ctrl+click. Not many users know that feature, and even fewer are using it. And so many of these pages are confusing, and combined with a sometimes broken back-button it becomes a nightmare.

In some of these cases javascript is used to make things better (and more complicated). In the case of forms, javascript is added to warn people against leaving the page with an unfinished form. Javascript is used to turn certain links to target=”_blank” ones. Some people try to open new tabs with javascript.

So my proposal is to introduce a with the following values for open-links and on-form:

  • open-links="new-tab" – open a link tab for each link on the page or in the current div/section/…
  • open-links="new-window" – same as above, but open a new window (almost never a good idea)
  • open-links="new-tab-on-form" – only open a new tab if there a form on the page (possibly another requirement can be that the form is being partially filled)
  • open-links="new-window-on-form" – same as above, but with a new window
  • open-links="warn-on-form" – open the link in the same tab, but if there’s a partially filled form, warn the user they will lose their input

The default value, I think, should be new-tab-on-form.

It might introduce new complexities and may confuse users further. But I think it’s worth trying to fix this important part of the web rather than leaving each website handle it on their own (or forgetting to handle it).

The post Implicit _target=”blank” appeared first on Bozho's tech blog.

How to create secure software? Don’t blink! [talk]

Post Syndicated from Bozho original https://techblog.bozho.net/how-to-create-secure-software-dont-blink-talk/

Last week Acronis (famous for their TrueImage) organized a conference in Sofia about cybersecurity for developers and I was invited to give a talk.

It’s always hard to pick a topic for a talk on a developer conference that is not too narrowly focused — if you choose something too high level, you can be uesless to the audience and seen as a “bullshitter”; if you pick something too specific, half of the audience may be bored because it is not their area of work.

So I chose the middle ground — an overview talk with as much specifics as possible in it. I tried to tell interesting stories of security vulnerabilities to illustrate my points. The talk is split in several parts:

  • Purpose of attacks
  • Front-end vulnerabilities (examples and best practices)
  • Back-end vulnerabilities (examples and best practices)
  • Infrastructure vulnerabilities (examples and best practices)
  • Human factor vulnerabilities (examples and best practices)
  • Thoughts on how this fits into the bigger picture of software security

You can watch the 30 minutes video here:

If you would like to download my slides, click here. or view them at SlideShare:

The point is — security is hard and there are a million things to watch for and a million things that can go wrong. You should minimize risk by knowing and following as much best practices as possible. And you should not assume you are secure, as even the best companies make rookie mistakes.

The security mindset, which is partly formalized by secure coding practices, is at the core of having a secure software. Asking yourself constantly “what could go wrong” will make software more secure. It is a whole other topic of how to make all software more secure, not just the ones we are creating, but it is less technical and goes through the topics public policies, financial incentives to invest in security and so on.

For technical people it’s important to know how to make a focused effort to make a system more secure. And to apply what we know, because we may know a lot and postpone it for “some other sprint”.

And as a person from the audience asked me — is not blinking really the way? Of course not, that effort won’t be justified. But if we cover as much of the risks as possible, that will give us some time to blink.

The post How to create secure software? Don’t blink! [talk] appeared first on Bozho's tech blog.

Blockchain – What Is It Good For? [slides]

Post Syndicated from Bozho original https://techblog.bozho.net/blockchain-what-is-it-good-for-slides/

Last week I gave a 20 minute talk on the way I see blockchain applicability. I’ve always been skeptical of the blockchain hype, having voiced my concerns, my rants and other thoughts on the matter.

I’ve followed actual blockchain projects that didn’t really need blockchain but managed to yield some very good results by digitizing processes, by eliminating human error, and occasionally, by guaranteeing the integrity of data. And recently I read an article that put these observations into perspective – that blockchain is just a tool for digital transformation (a buzzword broadly meaning “doing things on a computer and more efficiently”). That rarely the distributed consensus is needed, let alone public ledgers. But that doesn’t matter, as long as the technology has lead to some processes being digitized and transformed.

So here are the slides from my talk:

And people are usually surprised that I have a blockchain-related company and I’m so skeptical at the same time. But that’s actually logical – I know how the technology works, what problems it solves and how it can be applied in a broad set of domains. And that’s precisely why I don’t think it’s a revolution. It’s a wonderful piece of technological innovation that will no doubt solve some problems much better than they were solved before, but it won’t be the new internet and it won’t change everything.

Doesn’t that skepticism hurt my credibility as a founder of a blockchain-related startup? Not at all – I don’t want to get a project just because of a buzzword – that’s not sustainable anyway. I want to get it because it solves a real problem that the customer has. And to solve it the right way, i.e. with the best technologies available. And blockchain’s underlying mechanisms are a great tool in the toolbox. Just not a revolution.

In order to be revolutionary, something has to bring at least 10 times improvement over existing practices, or make a lot of things possible that weren’t possible before. Blockchain is neither. I got a question from the audience – “well, isn’t it a 10 times innovation in payments?”. My counter-question was: “Have you ever bought something with cryptocurrencies?”. Well, no. It also doesn’t improve 10 times cross-organization integration. Yes, it might help to establish a shared database, but you could’ve done that with existing technology if you needed to.

But if the blockchain hype helped people realize that digital events can be protected, and that stakeholders can exchange data and present proofs to each other that they haven’t modified the data, who cares if the ultimate implementation will be based on Ethereum, Hyperledger, Corda, or just a clever use of digital signatures, timestamps and web services, or perhaps simply merkle trees.

I hope that blockchain gets demystified soon and we all start speaking the same language (so that I don’t need to reassure an audience at a banking summit that – no, we are not doing cryptocurrencies in our blockchain company). Once we get there, we’ll be able to efficiently solve the problems of digital transformation. As for the digital revolution – it is already happening. We are moving everything online. And yes, with centralized services rather than distributed p2p networks, but that’s not a technical issue, it’s a socioeconomic one. And technology by itself is rarely a solution to such problems.

The post Blockchain – What Is It Good For? [slides] appeared first on Bozho's tech blog.

Technical Innovation vs. Process Innovation

Post Syndicated from Bozho original https://techblog.bozho.net/technical-innovation-vs-process-innovation/

We are often talking about “innovation” and “digital innovation” (or “technical innovation”) in particular, when it comes to tech startups. It has, unfortunately, become a cliche, and now “innovation” is devoid of meaning. I’ve been trying to put some meaningful analysis of the “innovation landscape” and to classify what is being called “innovation”.

And the broad classification I got to is “technical innovation” vs “process innovation”. In the majority of cases, tech startups are actually process innovations. They get existing technology and try to optimize some real world process with it. These processes include “communicating with friends online”, “getting in touch with business contacts online”, “getting a taxi online”, “getting a date online”, “ordering food online”, “uploading photos online”, and so on. There is no inherent technical innovation in any of these – they either introduce new (and better) processes, or they optimize existing ones.

And don’t get me wrong – these are all very useful things. In fact, this is what “digital transformation” means – doing things electronically that were previously done in an analogue way, or doing things that were previously not possible in the analogue world. And the better you imagine or reimagine the process, the more effective your company will be.

In many cases these digital transformation tools have to deal with real-world complexities – legislation, entrenched behaviour, edge cases. E.g. you can easily write food delivery software. You get the order, you notify the store, you optimize the delivery people’s routes to collect and deliver as much food as possible, and you’re good to go. And then you “hit” the real world, where there are traffic jams, temporarily closed streets, restricted parking, unreponsive restaurants, unresponsive customers, keeping the online menu and what’s in stock in sync, worsened weather conditions, messed up orders, part-time job regulations that differ by country, and so on. And you have to navigate that maze in order to deliver a digitally transformed food delivery service.

There is nothing technically complex about that – any kid with some PHP and JS knowledge can write the software by finding answers to the programming hurdles on Stackoverflow. In that sense, it is no so technically innovative. The hard part is the processes and the real-world complexities. And of course, turning that into a profitable business.

In the long run, these non-technical innovations end up producing technical innovation. Facebook had nothing interesting on the technical side in the beginning. Then it had millions of users and had to scale, and then it became interesting – how to process so much data, how to scale to multiple parts of the world, how to optimize the storage of so many photos, and so on. Facebook gave us Cassandra, Twitter gave us Snowflake, LinkedIn gave us Kafka. There are many more examples, and it’s great that these companies open source some of their internally developed technologies. But these are side-effects of the scale, not an inherent technical innovation that lead to the scale in the first place.

And then there’s the technical innovation companies. I think it’s a much more rare phenomenon and the prime example is Google – the company started as a consequence of a research paper. Roughly speaking, the paper outlined a technical innovation in search that made all other approaches to search obsolete. We can say that Bitcoin was such an innovation, as well. In some cases it’s not the founders that develop the original research, but they derive their product from existing computer science research. They combine multiple papers, adapt them to the needs of the real world (because, as we know, research papers often rely on a “spherical horse in vacuum”) and build something useful.

As a personal side-note here, some of my (side) projects were purely process innovations – I once made an online bus ticket reservation service (before such a thing existed in my country), then I made a social network aggregator (that was arguably better than existing ones at the time). And they were much less interesting than my more technically innovative projects, like Computoser (which has some original research) or LogSentinel (which combines several research papers into a product).

A subset of the technical innovation is the so called “deep tech” – projects that are supposed to enable future innovation. This can be simplified as “applied research”. Computer vision, AI, biomedical. This is where you need a lot of R&D, not simply “pouring” code for a few months.

Just as “process innovation” companies eventually lead to technical innovation, technical innovation companies eventually (or immediately) lead to process improvements. Google practically changed the way we consume information, so it’s impact on the processes is rather high. And to me, that’s the goal of each company – to get to change behaviour. It’s much more interesting to do that using never-before-done technical feats, but if you can do it without the technical bits (i.e. by simply building a website/app using current web/mobile frameworks), good for you.

If you become a successful company, you’ll necessarily have both types of innovation, regardless of how you started. And in order to have a successful company, you have to improve processes and change behaviour. You have to do digital transformation. In the long run, it doesn’t make that much of a difference which was first – the technology or the process innovation. Although from business and investment perspective, it’s easier for competitors to copy the processes and harder to copy internal R&D.

Whether we should call process innovation “technical innovation” – I don’t think so, but that ship has already sailed – anything that uses technology is now “technical innovation”, even if it’s a WordPress site. But for technical people it’s much more challenging and rewarding to be based on actual technical innovation. We just have to remember that we have to solve real-world problems, improve or introduce processes and change behaviour.

The post Technical Innovation vs. Process Innovation appeared first on Bozho's tech blog.

Resources on Distributed Hash Tables

Post Syndicated from Bozho original https://techblog.bozho.net/resources-on-distributed-hash-tables/

Distributed p2p technologies have always been fascinating to me. Bittorrent is cool not because you can download pirated content for free, but because it’s an amazing piece of technology.

At some point I read and researched a lot about how DHTs (distributed hash tables) work. DHTs are not part of the original bittorrent protocol, but after trackers were increasingly under threat to be closed for copyright infringment, “trackerless” features were added to the protocol. A DHT is distributed among all peers and holds information about which peer holds what data. Once you are connected to a peer, you can query it for their knowledge on who has what.

During my research (which was with no particular purpose) I took a note on many resources that I thought useful for understanding how DHTs work and possibly implementing something ontop of them in the future. In fact, a DHT is a “shared database”, “just like” a blockchain. You can’t trust it as much, but proving digital events does not require a blockchain anyway. My point here is – there is a lot more cool stuff to distributed / p2p systems than blockchain. And maybe way more practical stuff.

It’s important to note that the DHT used in BitTorrent is Kademlia. You’ll see a lot about it below.

Anyway, the point of this post is to share the resources that I collected. For my own reference and for everyone who wants to start somewhere on the topic of DHTs.

I hope the list is interesting and useful. It’s not trivial to think of other uses of DHTs, but simply knowing about them and how they work is a good thing.

The post Resources on Distributed Hash Tables appeared first on Bozho's tech blog.

Algorithmic and Technological Transparency

Post Syndicated from Bozho original https://techblog.bozho.net/algorithmic-and-technological-transparency/

Today I had a talk on OpenFest about algorithmic and technological transparency. It was a somewhat abstract and high-level talk but I hoped to make it more meaningful than if some random guy just spoke about techno-dystopias.

I don’t really like these kinds of talks where someone who has no idea what a “training set” is, how dirty the data is and how to write five lines in Python talks about an AI revolution. But I think being a technical person let’s me put some details into the high-level stuff and make it less useless.

The problem I’m trying to address is opaque systems – we don’t know how recommendation systems work, why are seeing certain videos or ads, how decisions are taken, what happens with our data and how secure the systems we use are, including our cars and future self-driving cars.

Algorithms have real impact on society – recommendation engines make conspiracy theories mainstream, motivate fascists, create echo chambers. Algorithms decide whether we get loans, welfare, a conviction, or even if we get hit by a car (as in the classical trolley problem).

How do we make the systems more transparent, though? There are many approaches, some overlapping. Textual description of how things operate, tracking the actions of back-office admins and making them provable to third parties, implementing “Why am I seeing this?”, tracking each step in machine learning algorithms, publishing datasets, publishing source code (as I’ve previously discussed open-sourcing some aspects of self-driving cars).

We don’t have to reveal trade secrets and lose competitive advantage in order to be transparent. Transparency doesn’t have to come at the expense of profit. In some cases it can even improve the perception of a company.

I think we should start with defining best practices then move to industry codes and standards, and if that doesn’t work, carefully think of focused regulation (but have in mind politicians are mostly clueless about technology and may do more damage)

The slides from the talk are below (the talk itself was not in English)

It can be seen as our responsibility as engineers to steer our products in that direction. Not all of them require or would benefit from such transparency (rarely anyone cares about how an accounting system works), and it would normally be the product people that can decide what should become more transparent, but we also have a stake and should be morally accountable for what we build.

I hope we start thinking and working towards more transparent systems. And of course, each system has its specifics and ways to become more transparent, but I think it should be a general mode of thinking – how do we make our systems not only better and more secure, but also more understandable so that we don’t become the “masterminds” behind black-boxes that decide lives.

The post Algorithmic and Technological Transparency appeared first on Bozho's tech blog.

Random App Ideas

Post Syndicated from Bozho original https://techblog.bozho.net/random-app-ideas/

Every now and then you start thinking “wouldn’t it be nice to have an app for X”. When I was in that situation, I took a note. Then that note grew, I cleaned up some absurd ones and added some more. I implemented some of these ideas of mine, the rest formed a “TODO” list.

At some point I realized I won’t be able to implement them, as the time needed is not something I’m likely to have in the near future. But I didn’t like to just delete the notes. So here they are – my random app ideas. Probably useless, but maybe a little interesting.

  • Receipts via smartphone NFC – there are apps that let you track your expenses, but they are tedious. There are apps that try to OCR receipts, but they vary too much in different parts of the world for that to be consistent (there are companies like Receipt bank that do something like that, but it’s not the way I’d like this problem solved in the long run). So I thought stores can offer the option for NFC receipts – you just tap your phone to a device (or another phone) and get the receipt in an electronic form. Not the picture, but the raw data – items, prices, taxes, issuer. Then you have to option to print it if you like. Of course I realized at some point that first legislation should allow for that – in many cases you must issue a paper receipt and the digital one is not “legal”. But the idea still remains viable and probably not hard to implement. It will allow much easier tracking of expenses by users and will save a lot of paper.
  • Am I drunk? – breathalyzer add-ons for phones are not something many people would buy. My idea was to detect drunkenness based on motion – the accelerometer (and possibly other sensors) could be trained to recognize a drunken walk on the street, thus informing you that you are… drunk. What’s the use of that – maybe if while walking to your car the app alerts you, you may be less likely to drive.
  • Mobile scavenger hunt – scavenger hunts have always been interesting. I remember we devised one for a friend’s birthday a few years ago. It would be nice, though, to be able to build such scavenger hunts dynamically, based on multiple configurable building blocks. The app could generate multiple outcomes based on certain rules (mostly using the GPS of the phone). It could be timed, it could include scanning barcodes (including those on actual products), visiting particular addresses, listening to songs and their lyrics, extracting data from Wikipedia articles, etc. “Generate scavenger hunt” (for a friend or randomly for yourself) might be interesting.
  • Sentiment analysis prior to commenting – something that somehow plugs to facebook or other social media (or screen-scrapes it?), plus a browser plugin (for desktop), that does sentiment analysis on the facebook comments you are currently typing. And recommending against them if they are too negative, personal attacks, etc. Kind of like a cool-down timer saying “thing twice before destroying the online discussion with that comment”. It is a problem of online communication that too often it gets out of control (and the same people wouldn’t be as rude in real life as online)
  • OCR shop signs – this was more of a Google glass app than for regular phone, as it requires capturing one’s surroundings. The idea is to crowdsource the current shops and restaurants and put them on a map. At the moment we rely on old and incomplete data – either data that the owners of a shop or restaurant have put, or that some contributor has added. However, shops and restaurants change often and you may not know that something has moved. It doesn’t sound that useful, but worth a thought.
  • Algorithmic impressionist painter – you train an algorithm and it creates a random picture. Now, that has already been done (I’ve been having the idea ever since I created my algorithmic music composer). And I’ve heard critics destroy the algorithmic paintings. But it’s an interesting experiment nonetheless.
  • Life game – basically record your actions and get rewarded for good deeds. The game can give particular challenges (e.g. “go help a soup kitchen”), and bonuses. The challenge here lies in data protection – depending on the level of detail, the application may have too much personal data that should not leak. Encrypting everything with a private key stored in the secure storage of your device may be one way to resolve that. I know it sounds a bit dystopian and now, after seeing something like that implemented in China, I’m happy that I can say “oh, I had this idea six years ago” and I’ve grown since then. Anyway, in my rendition it’s a voluntary game, not a state-imposed citizen score. But re-purposing technologies is a specialty of regimes, so maybe it’s a bad idea to build such a system in the first place.
  • Random hugs – you know those “free hugs” signs. You can have an app that detects other users of the same app via WiFi direct (or other GPS or non-GPS proximity/distance detection) and offers both of you to do a random hug. Yes, yes, I know it is awkward to hug strangers and it may be abused for sexual misconduct. In my defense, when I had the idea six years ago, the public attention and knowledge about sexual abuse was not on the levels it is today. Still, with a proper rating system and “report abuse”, this may not be an issue. And as hugs are considered a good thing for psychological health, it might not be such the dumb idea that it sounds.

In all cases above, there are interesting technical challenges – devising a standard receipt format and optimizing the UX of receipt exchange, training a model to detect drunkenness based on accelerometer readings, training a painting algorithm, doing sentiment analysis on small pieces of text, using niche technologies like WiFi direct for proximity, data protection and cryptography, OCR-ing pictures of surroundings. This is why I’m sharing them in this blog.

This is in line with my previous posts about making side projects with technologies that are new for you. The app may end up useless, or not used by many people, but you’ve still learned an important skill and applied it in practice.

The post Random App Ideas appeared first on Bozho's tech blog.

Models for Electronic Identification

Post Syndicated from Bozho original https://techblog.bozho.net/models-for-electronic-identification/

Electronic identity is an important concept as it lies on the crossroads of the digital, the physical and the legal worlds. How do you prove your identity without being physically present? I’ve previously given an overview talk on electronic identification, tackled the high-level and philosophical aspects, and criticized biometric-only identification as insecure. Now I want to list a few practical ways in which eID can be implemented.

First, it’s important to mention the eIDAS regulation which makes electronic identity a legal reality. It says what an electronic identification scheme is and what are its legal effects (proving that you are who you claim to be). And it defines multiple “levels of assurance”, i.e. the levels of security of a method of authentication. But it is a broad framework that doesn’t tell you how to do things in the technical world.

And while electronic identification is mostly cited in the context of government and municipal services, it applies to private companies as well. Currently in the US, for example, the SSN is used for electronic identification. This is a very poor approach, as leaking the SSN allows for identity theft. In the EU there are many different approaches to do that, from the Estonian PKI-based approach, to UK’s verify initiative, which relies on the databases of private companies.

You can see electronic identity as a more legally-meaningful login. You still perform a login, in many cases using username and password as one of the factors, but it carries additional information – who is the actual person behind that login. In some cases it doesn’t have to even give information on who the person is – it can just confirm that such a person exists, and some attributes of that person – age (e.g. if you want to purchase alcohol online), city of registration (e.g. if you want to use municipal services), conviction status (e.g. if applying for a driver in an Uber-like service). It is also very useful when doing anti money laundering checks (e.g. if you are a payment provider, an online currency or crypto-currency exchange, etc.)

Electronic identification schemes can be public and private. Public are operated by governments (federal or state in the case of the US), a particular institution (e.g. the one the issues driver’s licenses). Private ones can be operated by any company that has had the ability to verify your physical identity – e.g. by you going and signing a contract with them. A bank, a telecom, a utility company.

I will use “authentication” and “identification” interchangeably, and for practical purposes this is sort-of true. They differ, of course, as authentication is proving you are who you are, and identification is uniquely identifying you among others (and they have slightly different meanings when it comes to cryptography, but let’s not get carried away in terminology).

Enrollment is the process of signing you up in the electronic identification scheme’s database. It can include a typical online registration step, but it has to do proper identity verification. This can be done in three ways:

  • In-person – you physically go to a counter to have your identity verified. This is easy in the EU where ID cards exists, and a bit harder in the US, where you are not required to actually have an identity document (though you may have one of several). In that case you’d have to bring a birth certificate, utility bills or whatever the local legislation requires
  • Online – any combination of the following may be deemed acceptable, depending on the level of assurance needed: a videoconferencing call; selfie with an identity document; separate picture of an identity document; camera-based liveness detection; matching of selfie with government-registered photo. Basically, a way to say that 1. I have this document 2. I am the person on the document. This could be automated or manual. But does not require physical presence.
  • By proxy – by relying on another eID provider that can confirm your identity. This is an odd option, but you can cascade eID schemes.

And then there’s the technical aspects – what do you add to “username and password” to make identity theft less likely or nearly impossible:

  • OTP (one-time passwords). This can be a hardware OTP token (e.g. RSA SecureID) or a software-based TOTP (like Google Authenticator). The principal of both is the same – the client and the server share a secret, and based on the current time, generate a 6-digit password. Note that storing the secrets on the server side is not trivial – ideally that should be on an HSM (hardware security module) that can do native OTP, otherwise the secrets can leak and your users can be impersonated (The HSM is supposed to not let any secret key leave the hardware). There are less secure OTP approaches, like SMS or other type of messages – you generate one and send it to a registered phone, viber, telegram, email, etc. Banks often use that for their login, but it cannot be used across organizations, as it would require the secrets to be shared. Because the approach is centralized, you can easily revoke an OTP, e.g. declare a phone or OTP device as stolen and then go get a new one / register a new phone.
  • PKI-based authentication – when you verify the person’s identity, have them generate a private key, and issue a X.509 certificate for the corresponding public key. That way the user can use the private key to authenticate (the most straightforward way – TLS mutual authentication, where the user signs a challenge with the private key to confirm they are the “owners” of the certificate). The certificate would normally hold some identifier which can then be used to fetch data from databases. Alternatively, the data can be on the certificate itself, but that has some privacy implications and is rarely a good option. This option can be use across institutions, as you can prove you are the person that owns a private key without the other side needing to share a secret with you. They only need the certificate, and it is public anyway. Another benefit of PKI-based authentication is revokability – in case the user’s private key is somehow compromised, you can easily revoke a certificate (publish it in a CRL, for example).
  • Biometrics – when you are enrolled, you scan a fingerprint, a palm, an iris, a retina or whatever the current cool biometric tech is. I often argue that this cannot be your main factor of authentication. It can and sometimes should be used as an additional safeguard, but it has a big problem – it cannot be revoked. Once a biometric identifier is leaked, it is impossible to stop people from using it. And while they may not be able to fool scanners (although for fingerprints that has been proven easy in the past), the scanners communicate with a server which perform authentication. An attacker may simply spoof a scanner and make it seem to the server that the biometric data was properly obtained. If that has to be avoided, scanners have to themselves be identified by signing a challenge with a private key in a secure hardware module, which make the whole process too complicated to be meaningful. But then again – the biometric factor is useful and important, as we’ll see below.

The typical “factors” in an authentication process are: something you know (passwords), something you have (OTP token, smartcard, phone) and something you are (biometrics). The “something you have” part is what generates multiple variations to the PKI approach mentioned above:

  • Use an unprotected storage on a computer to store the private key – obviously not secure enough, as the private key can be easily extracted and your identity can thus be stolen. But it has to be mentioned, as it can be useful in lower-risk scenarios
  • Use a smartcard – a hardware device that can handle PKI operations (signing, encryption) and does not let private keys leave the hardware. Smartcards are tricky, as they require a reader (usually plugged via USB) and vendor-specific drivers and “magic” to have browser support. Depending on the circumstances, it could be a good approach, as it is the most secure – there is no way for someone to impersonate you, apart from stealing your smartcard and knowing both your smartcard PIN and your password. The problem with plugging the smartcard can be alleviated by utilzing NFC with a smartphone (so just place the card on the back of the smartphone in order to authenticate) but that leads to a lot more other problems, e.g. how to protect the communication from eavesdropping and MITM attacks (as far as I know, there is no established standard for that, except for NFC-SEC, which I think is not universally supported). The smartcard can be put on a national ID card, a separate card, or even the ones in existing bank cards can be reused (though issuers are reluctant to share the chip with functionality (applets) other than the EMV).
  • Use a smartphone – smartphones these days have secure storage capabilities (e.g. Android’s Trusted execution environment or iPhone’s Secure Enclave). A few years ago when I was doing a more thorough, these secure modules were not perfect and there were known attacks, but they have certainly improved. You can in many cases rely on a smartphone to protect the private key. Then, in order to authenticate, you’d need a PIN or biometrics to unlock the phone. Here’s when biometrics come really handy – when they don’t leave the phone, so even if leaked, they cannot be used to impersonate you. They can only be used to potentially make a fake fingerprint to unlock the phone (which should also be stolen). And of course, there’s still the password (“something you know”).
  • Remote HSM – the private keys can be stored remotely, on a hardware security module, so that they cannot physically leave the hardware. However, the hardware is not under your physical control and unlocking it requires just a PIN, which turns this scheme into just “something you know” (times 2, if you add the password). Remote identification and remote signing schemes are becoming popular and in order for them to be secure, you also have to somehow associate the device with the particular user and their private key on the HSM. This can be done in a combination of ways, including the IMEI of the phone (which is spoofable, though) and utilizing some of the aforementioned options – the protected storage of the phone and OTPs handled behind the scene. (Note: the keys on the HSM should be in the protected storage. Having them in an external database encrypted by the master key is not good enough, as they can still leak). If you are going to rely on the smartphone secure storage anyway, what’s the benefit of the remote HSM? It’s twofold – first – loosing the phone doesn’t mean you cannot use the same key again, and second – it reduces the risk of leaking the key, as the HSM is theoretically more secure than the smartphone storage
  • Hybrid / split key – the last two approaches – the smartphone secure storage and the remote HSM can be combined for additional security. You can have the key split in two – part in the smartphone, part on the HSM. That way you are reducing the risk of the key leaking. Losing the phone, however, would mean the key should be regenerated and new certificates should be issued, but that may be okay depending on the usecase.

As you can see, the smartphone secure storage is becoming animportant aspect for electronic identification that is both secure and usable. It allows easily adding a biometric factor without the need to be able to revoke it. And it doesn’t rely on a clunky smartcard that you can’t easily plug.

This is not everything that can be said about secure electronic identification, but I think it’s enough details to get a good picture. It’s not trivial and getting it wrong may lead to real-world damages. It is viewed as primarily government-related, but the eID market in Europe is likely going to grow (partly thanks to unifying the legislation by eIDAS) and many private providers will take part in that. In the US the problem of identity theft and the horrible practice of using the SSN for authentication is being recognized and it’s likely that legislative efforts will follow to put electronic identification on track and in turn foster a market for eID solutions (which is currently a patchwork of scanning and manual verification of documents).

The ultimate goal is to be both secure and usable. And that’s always hard. But thanks to the almost ubiquitous smartphone, it is now possible (though backup options should exist for people that don’t have smartphones). Electronic identification is a key enabler for the so called “digital transformation”, and getting it right is crucial for the digital economy. Apologies for the generic high-level sentence, but I do think we should have technical discussions at the same time as policy discussions, otherwise the two diverge and policy monsters are born.

The post Models for Electronic Identification appeared first on Bozho's tech blog.

Typical Workarounds For Compliant Logs

Post Syndicated from Bozho original https://techblog.bozho.net/typical-workarounds-for-compliant-logs/

You may think you have logs. Chances are, you can rely on them only for tracing exceptions and debugging. But you can’t rely on them for compliance, forensics, or any legal matter. And that may be none of your concern as an engineer, but it is one of those important non-functional requirements that, if not met, are ultimately our fault.

Because of my audit trail startup, I’m obviously both biased and qualified to discuss logs (I’ve previously described what audit trail / audit logs are and how they can be maintained). And while they are a only a part of the security aspects, and certainly not very exciting, they are important, especially from a legal standpoint. That’s why many standards and laws – including ISO 27001, PCI-DSS, HIPAA, SOC 2, PSD2, GDPR – require audit trail functionality. And most of them have very specific security requirements.

PCI-DSS has a bunch of sections with audit trail related requirements:

10.2.3
“Malicious users often attempt to alter audit logs to hide their actions, and a record of access allows an organization to trace any inconsistencies or potential tampering of the logs to an individual account. [..]”

10.5 Secure audit trails so they cannot be altered. Often a malicious individual who has entered the network will attempt to edit the audit logs in order to hide their activity. Without adequate protection of audit logs, their completeness, accuracy, and integrity cannot be guaranteed, and the audit logs can be rendered useless as an investigation tool after a compromise.

10.5.5
Use file-integrity monitoring or change-detection software on logs to ensure that existing log data cannot be changed without generating alerts (although new data being added should not cause an alert).

ISO 27001 Annex A also speaks about protecting the audit trail against tampering

A.12.4.2 Protection of log information
Logging facilities and log information shall be protected against tampering and unauthorized access

From my experience, sadly, logs are rarely fully compliant. However, auditors are mostly fine with that and certification is given, even though logs can be tampered with. I decided to collect and list the typical workarounds the the secure, tamper-evident/tamper-protected audit logs.

  • We don’t need them secured – you almost certainly do. If you need to be compliant – you do. If you don’t need to be compliant, but you have a high-value / high-impact system, you do. If it doesn’t have to be compliant and it’s low-value or low-impact, than yes. You don’t need much security for that anyway (but be careful to not underestimate the needed security)
  • We store it in multiple places with different access – this is based on the assumption that multiple administrators won’t conspire to tamper with the logs. And you can’t guarantee that, of course. But even if you are sure they won’t, you can’t prove that to external parties. Imagine someone sues you and you provide logs as evidence. If the logs are not tamper-evident, the other side can easily claim you have fabricated logs and make them inadmissible evidence.
  • Our system is so complicated, nobody knows how to modify data without breaking our integrity checks – this is the “security through obscurity” approach and it will probably work well … until it doesn’t.
  • We store it with external provider – external log providers usually claim they provide compliance. And they do provide many aspects of compliance, mainly operational security around the log collection systems. Besides, in that case you (or your admins) can’t easily modify the externally stored records. Some providers may give you the option to delete records, which isn’t great for audit trail. The problem with such services is that they keep the logs operational for relatively short periods of time and then export them in a form that can be tampered with. Furthermore, you can’t really be sure that the logs are not tampered with. Yes, the provider is unlikely to care about your logs, but having that as the main guarantee doesn’t sound perfect.
  • We are using trusted timestamping – and that’s great, it covers many aspects of the logs integrity. AWS is timestamping their CloudTrail log files and it’s certainly a good practice. However, it comes short in just one scenario – someone deleting an entire timestamped file. And because it’s a whole file, rather than record-by-record, you won’t know which record exactly was targeted. There’s another caveat – if the TSA is under your control, you can backdate timestamps and therefore can’t prove that you didn’t fabricate logs.

These approaches are valid and are somewhere on a non-zero point on the compliance spectrum. Is having four copies of the data accessible to different admins better than having just one? Yup. Is timestamping with a local TSA better than not timestamping? Sure. Is using an external service more secure than using a local one? Yes. Are they sufficient to get certified? Apparently yes. Is this the best that can be done? No. Do you need the best? Sometimes yes.

Most of these measures are organizational rather than technical, and organizational measures can be reversed or circumvented much more easily than technical ones. And I may be too paranoid, but when I was a government advisor, I had to be paranoid when it comes to security.

And what do I consider a non-workaround? There is a lot of research around tamper-evident logging, tamper-evident data structures, merkle trees, hash chains. Here’s just one example. It should have been mainstream by now, but it isn’t. We’ve apparently settled on suboptimal procedures (even when it comes to cost), and we’ve interpreted the standards loosely, looking for a low-hanging fruit. And that’s fine for some organizations. I guess.

It takes time for a certain good practice to become mainstream, and it has to be obvious. Tamper-evident logging isn’t obvious. We had gradually become more aware about properly storing passwords, using TLS, multi-factor authentication. But security is rarely a business priority, as seen in reports about what drives security investments (it’s predominantly “compliance”).

As a practical conclusion – if you are going to settle for a workaround, at least choose a better one. Not having audit trail or not making any effort to protect it from tampering should be out of the question.

The post Typical Workarounds For Compliant Logs appeared first on Bozho's tech blog.

Proving Digital Events (Without Blockchain)

Post Syndicated from Bozho original https://techblog.bozho.net/proving-digital-events-without-blockchain/

Recently technical and non-technical people alike started to believe that the best (and only) way to prove that something has happened in an information system is to use a blockchain. But there are other ways to achieve that that are arguably better and cheaper. Of course, blockchain can be used to do that, and it will do it well, but it is far from the only solution to this problem.

The way blockchain proves that some event has occurred by putting it into a tamper-evident data structure (a hash chain of the roots of merkle trees of transactions) and distributing that data structure across multiple independent actors so that “tamper-evident” becomes “tamper-proof” (sort-of). So if an event is stored on a blockchain, and the chain is intact (and others have confirmed it’s intact), this is a technical guarantee that it had indeed happened and was neither back-dated, nor modified.

An important note here – I’m stressing on “digital” events, because no physical event can be truly guaranteed electronically. The fact that someone has to enter the physical event into a digital system makes this process error-prone and the question becomes “was the event correctly recorded” rather than “was it modified once it was recorded”. And yes, you can have “certified” / “approved” recording devices that automate transferring physical events to the digital realm, e.g. certified speed cameras, but the certification process is a separate topic. So we’ll stay purely in the digital realm (and ignore all provenance use cases).

There are two aspects to proving digital events – technical and legal. Once you get in court, it’s unlikely to be able to easily convince a judge that “byzantine fault tolerance guarantees tamper-proof hash chains”. You need a legal framework to allow for treating digital proofs as legally binding.

Luckily, Europe has such a legal framework – Regulation (EU) 910/2014. It classifies trust services in three categories – basic, advanced and qualified. Qualified ones are always supplied by a qualified trust service provider. The benefit of qualified signatures and timestamps is that the burden of proof is on the one claiming that the event didn’t actually happen (or was modified). If a digital event is signed with a qualified electronic signature or timestamped with a qualified timestamp, and someone challenges that occurrence of the event, it is they that should prove that it didn’t happen.

Advanced and basic services still bear legal strength – you can bring a timestamped event to court and prove that you’ve kept your keys securely so that nobody could have backdated an event. And the court should acknowledge that, because it’s in the law.

Having said that, the blockchain, even if it’s technically more secure, is not the best option from a legal point of view. Timestamps on blocks are not put by qualified trust service providers, but by nodes on the system and therefore could be seen as non-qualified electronic time stamp. Signatures on transactions have a similar problem – they are signed by anonymous actors on the network, rather than individuals whose identity is linked to the signature, therefore making them legally weaker.

On the technical side, we have been able to prove events even before blockchain. With digital signatures and trusted timestamps. Once you do a, say, RSA signature (encrypt the hash of the content with your private key, so that anyone knowing your public key can decrypt it and match it to the hash of the content you claim to have signed, thus verifying that it is indeed you who signed it), you cannot deny having signed it (non-repudiation). The signature also protects the integrity of the data (it can’t be changed without breaking the signature). It is also known who signed it, owning the private key (authentication). Having these properties on an piece of data (“event”) you can use it to prove that this event has indeed occurred.

You can’t, however, prove when it occurred – for that you need trusted timestamping. Usually a third-party provider signing the data you send them, and having the current timestamp in the signed response. That way, using public key cryptography and a few centralized authorities (the CA and the TSA), we’ve been able to prove the existence of digital events.

And yes, relying on centralized infrastructure is not perfect. But apart from a few extreme cases, you don’t need 100% protection for 100% of your events. That is not to say that you should go entirely unprotected and hope that an event has occurred simply because it is in some log file.

Relying on plain log files for proving things happened is a “no-go”, as I’ve explained in a previous post about audit trail. You simply can’t prove you didn’t back-date or modify the event data.

But you can rely on good old PKI to prove digital events (of course, blockchain also relies on public key cryptography). And the blockchain approach will not necessarily be better in court.

In a private blockchain you can, of course, utilize centralized components, like a TSA (Time stamping authority) or a CA to get the best of both worlds. And adding hash chains and merkle trees to the mix is certainly great from a technical perspective (which is what I’ve been doing recently). But you don’t need a distributed consensus in order to prove something digital happened – we’ve had the tools for that even before proof-of-work distributed consensus existed.

The post Proving Digital Events (Without Blockchain) appeared first on Bozho's tech blog.