Tag Archives: Opinions

Integrity Guarantees of Blockchains In Case of Single Owner Or Colluding Owners

2021-11-28 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/integrity-guarantees-of-blockchains-in-case-of-single-owner-or-colluding-owners/

The title may sound as a paper title, rather than a blogpost, because it was originally an idea for such, but I’m unlikely to find the time to put a proper paper about it, so here it is – a blogpost.

Blockchain has been touted as the ultimate integrity guarantee – if you “have blockchain”, nobody can tamper with your data. Of course, reality is more complicated, and even in the most distributed of ledgers, there are known attacks. But most organizations that are experimenting with blockchain, rely on a private network, sometimes having themselves as the sole owner of the infrastructure, and sometimes sharing it with just a few partners.

The point of having the technology in the first place is to guarantee that once collected, data cannot be tampered with. So let’s review how that works in practice.

First, we have two define two terms – “tamper-resistant” (sometimes referred to as tamper-free) and “tamper-evident”. “Tamper-resistant” means nobody can ever tamper with the data and the state of the data structure is always guaranteed to be without any modifications. “Tamper-evident”, on the other hand, means that a data structure can be validated for integrity violations, and it will be known that there have been modifications (alterations, deletions or back-dating of entries). Therefore, with tamper-evident structures you can prove that the data is intact, but if it’s not intact, you can’t know the original state. It’s still a very important property, as the ability to prove that data is not tampered with is crucial for compliance and legal aspects.

Blockchain is usually built ontop of several main cryptographic primitives: cryptographic hashes, hash chains, Merkle trees, cryptographic timestamps and digital signatures. They all play a role in the integrity guarantees, but the most important ones are the Merkle tree (with all of its variations, like a Patricia Merkle tree) and the hash chain. The original bitcoin paper describes a blockchain to be a hash chain, based on the roots of multiple Merkle trees (which form a single block). Some blockchains rely on a single, ever-growing merkle tree, but let’s not get into particular implementation details.

In all cases, blockchains are considered tamper-resistant because their significantly distributed in a way that enough number of members have a copy of the data. If some node modifies that data, e.g. 5 blocks in the past, it has to prove to everyone else that this is the correct merkle root for that block. You have to have more than 50% of the network capacity in order to do that (and it’s more complicated than just having them), but it’s still possible. In a way, tamper resistance = tamper evidence + distributed data.

But many of the practical applications of blockchain rely on private networks, serving one or several entities. They are often based on proof of authority, which means whoever has access to a set of private keys, controls what the network agree on. So let’s review the two cases:

Multiple owners – in case of multiple node owners, several of them can collude to rewrite the chain. The collusion can be based on mutual business interest (e.g. in a supply chain, several members may team up against the producer to report distorted data), or can be based on security compromise (e.g. multiple members are hacked by the same group). In that case, the remaining node owners can have a backup of the original data, but finding out whether the rest were malicious or the changes were legitimate part of the business logic would require a complicated investigation.
Single owner – a single owner can have a nice Merkle tree or hash chain, but an admin with access to the underlying data store can regenerate the whole chain and it will look legitimate, while in reality it will be tampered with. Splitting access between multiple admins is one approach (or giving them access to separate nodes, none of whom has access to a majority), but they often drink beer together and collusion is again possible. But more importantly – you can’t prove to a 3rd party that your own employees haven’t colluded under orders from management in order to cover some tracks to present a better picture to a regulator.

In the case of a single owner, you don’t even have a tamper-evident structure – the chain can be fully rewritten and nobody will understand that. In case of multiple owners, it depends on the implementation. There will be a record of the modification at the non-colluding party, but proving which side “cheated” would be next to impossible. Tamper-evidence is only partially achieved, because you can’t prove whose data was modified and whose data hasn’t (you only know that one of the copies has tampered data).

In order to achieve tamper-evident structure with both scenarios is to use anchoring. Checkpoints of the data need to be anchored externally, so that there is a clear record of what has been the state of the chain at different points in time. Before blockchain, the recommended approach was to print it in newspapers (e.g. as an ad) and because it has a large enough circulation, nobody can collect all newspapers and modify the published checkpoint hash. This published hash would be either a root of the Merkle tree, or the latest hash in a hash chain. An ever-growing Merkle tree would allow consistency and inclusion proofs to be validated.

When we have electronic distribution of data, we can use public blockchains to regularly anchor our internal ones, in order to achieve proper tamper-evident data. We, at LogSentinel, for example, do exactly that – we allow publishing the latest Merkle root and the latest hash chain to Ethereum. Then even if those with access to the underlying datastore manage to modify and regenerate the entire chain/tree, there will be no match with the publicly advertised values.

How to store data on publish blockchains is a separate topic. In case of Ethereum, you can put any payload within a transaction, so you can put that hash in low-value transactions between two own addresses (or self-transactions). You can use smart-contracts as well, but that’s not necessary. For Bitcoin, you can use OP_RETURN. Other implementations may have different approaches to storing data within transactions.

If we want to achieve tamper-resistance, we just need to have several copies of the data, all subject to tamper-evidence guarantees. Just as in a public network. But what a public network gives is is a layer, which we can trust with providing us with the necessary piece for achieving local tamper evidence. Of course, going to hardware, it’s easier to have write-only storage (WORM, write once, ready many). The problem with it, is that it’s expensive and that you can’t reuse it. It’s not so much applicable to use-cases that require short-lived data that requires tamper-resistance.

So in summary, in order to have proper integrity guarantees and the ability to prove that the data in a single-owner or multi-owner private blockchains hasn’t been tampered with, we have to send publicly the latest hash of whatever structure we are using (chain or tree). If not, we are only complicating our lives by integrating a complex piece of technology without getting the real benefit it can bring – proving the integrity of our data.

The post Integrity Guarantees of Blockchains In Case of Single Owner Or Colluding Owners appeared first on Bozho's tech blog.

Hypotheses About What Happened to Facebook

2021-10-06 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/hypotheses-about-what-happened-to-facebook/

Facebook was down. I’d recommend reading Cloudflare’s summary. Then I recommend reading Facebook’s own account on the incident. But let me expand on that. Facebook published announcements and withdrawals for certain BGP prefixes which lead to removing its DNS servers from “the map of the internet” – they told everyone “the part of our network where our DNS servers are doesn’t exist”. That was the result of a backbone self-inflicted failure due to a bug in the auditing tool that checks whether the commands executed aren’t doing harmful things.

Facebook owns a lot of IPs. According to RIPEstat they are part of 399 prefixes (147 of them are IPv4). The DNS servers are located in two of those 399. Facebook uses a.ns.facebook.com, b.ns.facebook.com, c.ns.facebook.com and d.ns.facebook.com, which get queries whenever someone wants to know the IPs of Facebook-owned domains. These four nameservers are served by the same Autonomous System from just two prefixes – 129.134.30.0/23 and 185.89.218.0/23. Of course “4 nameservers” is a logical construct, there are probably many actual servers behind that (using anycast).

I wrote a simple “script” to fetch all the withdrawals and announcements for all Facebook-owned prefixes (from the great API of RIPEstats). Facebook didn’t remove itself from the map entirely. As CloudFlare points out, it was just some prefixes that are affected. It can be just these two, or a few others as well, but it seems that just a handful were affected. If we sort the resulting CSV from the above script by withdrawals, we’ll notice that 129.134.30.0/23 and 185.89.218.0/23 are the pretty high up (alongside 185.89 and 123.134 with a /24, which are all included in the /23). Now that perfectly matches Facebook’s account that their nameservers automatically withdraw themselves if they fail to connect to other parts of the infrastructure. Everything may have also been down, but the logic for withdrawal is present only in the networks that have nameservers in them.

So first, let me make three general observations that are not as obvious and as universal as they may sound, but they are worth discussing:

Use longer DNS TTLs if possible – if Facebook had 6 hour TTL on its domains, we may have not figured out that their name servers are down. This is hard to ask for such a complex service that uses DNS for load-balancing and geographical distribution, but it’s worth considering. Also, if they killed their backbone and their entire infrastructure was down anyway, the DNS TTL would not have solved the issue. But
We need improved caching logic for DNS. It can’t be just “present or not”; DNS caches may keep “last known good state” in case of SERVFAIL and fallback to that. All of those DNS resolvers that had to ask the authoritative nameserver “where can I find facebook.com” knew where to find facebook.com just a minute ago. Then they got a failure and suddenly they are wondering “oh, where could Facebook be?”. It’s not that simple, of course, but such cache improvement is worth considering. And again, if their entire infrastructure was down, this would not have helped.
Consider having an authoritative nameserver outside your main AS. If something bad happens to your AS routes (regardless of the reason), you may still have DNS working. That may have downsides – generally, it will be hard to manage and sync your DNS infrastructure. But at least having a spare set of nameservers and the option to quickly point glue records there is worth considering. It would not have saved Facebook in this case, as again, they claim the entire infrastructure was inaccessible due to a “broken” backbone.
Have a 100% test coverage on critical tools, such as the auditing tool that had a bug. 100% test coverage is rarely achievable in any project, but in such critical tools it’s a must.

The main explanation is the accidental outage. This is what Facebook engineers explain in the blogpost and other accounts, and that’s what seems to have happened. However, there are alternative hypotheses floating around, so let me briefly discuss all of the options.

Accidental outage due to misconfiguration – a very likely scenario. These things may happen to everyone and Facebook is known for it “break things” mentality, so it’s not unlikely that they just didn’t have the right safeguards in place and that someone ran a buggy update. The scenarios why and how that may have happened are many, and we can’t know from the outside (even after Facebook’s brief description). This remains the primary explanation, following my favorite Hanlon’s razor. A bug in the audit tool is absolutely realistic (btw, I’d love Facebook to publish their internal tools).
Cyber attack – It cannot be known by the data we have, but this would be a sophisticated attack that gained access to their BGP administration interface, which I would assume is properly protected. Not impossible, but a 6-hour outage of a social network is not something a sophisticated actor (e.g. a nation state) would invest resources in. We can’t rule it out, as this might be “just a drill” for something bigger to follow. If I were an attacker that wanted to take Facebook down, I’d try to kill their DNS servers, or indeed, “de-route” them. If we didn’t know that Facebook lets its DNS servers cut themselves from the network in case of failures, the fact that so few prefixes were updated might be in indicator of targeted attack, but this seems less and less likely.
Deliberate self-sabotage – 1.5 billion records are claimed to be leaked yesterday. At the same time, a Facebook whistleblower is testifying in the US congress. Both of these news are potentially damaging to Facebook reputation and shares. If they wanted to drown the news and the respective share price plunge in a technical story that few people understand but everyone is talking about (and then have their share price rebound, because technical issues happen to everyone), then that’s the way to do it – just as a malicious actor would do, but without all the hassle to gain access from outside – de-route the prefixes for the DNS servers and you have a “perfect” outage. These coincidences have lead people to assume such a plot, but from the observed outage and the explanation given by Facebook on why the DNS prefixes have been automatically withdrawn, this sounds unlikely.

Distinguishing between the three options is actually hard. You can mask a deliberate outage as an accident, a malicious actor can make it look like a deliberate self-sabotage. That’s why there are speculations. To me, however, by all of the data we have in RIPEStat and the various accounts by CloudFlare, Facebook and other experts, it seems that a chain of mistakes (operational and possibly design ones) lead to this.

The post Hypotheses About What Happened to Facebook appeared first on Bozho's tech blog.

Digital Transformation and Technological Utopianism

2021-09-18 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/digital-transformation-and-technological-utopianism/

Today I read a very interesting article about the prominence of Bulgarian hackers (in the black-hat sense) and virus authors in the 90s, linking that to the focus on technical education in the 80s, lead by the Bulgarian communist party in an effort to revive communism through technology.

Near the end of the article I was pleasantly surprised to read my name, as a political candidate who advocates for digital e-government and transformation of the public sector. The article then ended with something that I’m in deep disagreement with, but that has merit, and is worth discussing (and you can replace “Bulgaria” with probably any country there):

Of course, the belief that all the problems of a corrupt Bulgaria can be solved through the perfect tools is not that different to the Bulgarian Communist Party’s old dream that central planning through electronic brains would create communism. In both cases, the state is to be stripped back to a minimum

My first reaction was to deny ever claiming that the state would be stripped back to a minimum, as it will not (risking to enrage my libertarian readers), or to argue that I’ve never claimed there are “perfect tools” that can solve all problems, nor that digital transformation is the only way to solve those problems. But what I’ve said or written has little to do with the overall perception of techno-utopianism that IT people-turned-policy makers are usually struggling with.

So I decided to clearly state what e-government and digital transformation of the public sector is about.

First, it’s just catching up to the efficiency of the private sector. Sadly, there’s nothing visionary about wanting to digitize paper processes and provide services online. It’s something that’s been around for two decades in the private sector and the public sector just has to catch up, relying on all the expertise accumulated in those decades. Nothing grandiose or mind-boggling, just not being horribly inefficient.

When the world grows more complex, legislation and regulation grows more complex, the government gets more and more functions and more and more details to care about. There are more topics to have policy about (and many to take an informed decision to NOT have a policy about). All of that, today, can’t rely on pen-and-paper and a few proverbial smart and well-intentioned people. The government needs technology to catch up and do its job. It has had the luxury to not have competition and therefore it lagged behind. When there are no market forces to drive the digital transformation, what’s left is technocratic politicians. This efficiency has nothing to do with ideology, left or right. You can have “small government” and still have it inefficient and incapable of making sense of the world.

Second, technology is an enabler. Yes, it can help solve the problems with corruption, nepotism, lack of accountability. But as a tool, not as the solution itself. Take open data, for example (something I’ve been working on five years ago when Bulgaria jumped to the top of the EU open data index). Just having the data out there is an important effort, but by itself it doesn’t solve any problem. You need journalists, NGOs, citizens and a general understanding in society what transparency means. Same for accountability – it’s one thing to have every document digitized, every piece of data – published and every government official action leaving an audit trail; it’s a completely different story to have society act on those things – to have the institutions to investigate, to have the public pressure to turn that into political accountability.

Technology is also a threat – and that’s beyond the typical cybersecurity concerns. It poses the risk of dangerous institutions becoming too efficient; of excessive government surveillance; of entrenched interests carving their ways into the digital systems to perpetuate their corrupt agenda. I’m by no means ignoring those risks – they are real already. The Nazis, for example, were extremely efficient in finding the Jewish population in the Netherlands because the Dutch were very good at citizen registration. This doesn’t mean that you shouldn’t have an efficient citizen registration system. It means that it’s not good or bad per se.

And that gets us to the question of technological utopianism, of which I’m sometimes accused (though not directly in the quoted article). When you are an IT person, you have a technical hammer and everything may look like a binary nail. That’s why it’s very important to have a glimpse on humanities sides as well. Technology alone will not solve anything. And my blockchain skepticism is a hint in that direction – many blockchain enthusiasts are claiming that blockchain will solve many problems in many areas of life. It won’t. At least not just through clever cryptography and consensus algorithms. I once even wrote a sci-fi story about exactly the aforementioned communist dream of a centralized computer brain that solves all social issues while people are left to do what they want. And argued that no matter how perfect it is, it won’t work in a non-utopian human world. In other words, I’m rather critical of techno-utopianism as well.

The communist party, according to the author, saw technology as a tool by which the communist government would achieve its ideological goal.

My idea is quite different. First, technology necessary for “catching up” of the public sector, and second, I see technology as an enabler. What for – whether it’s for accountability or surveillance, fight with corruption or entrenching corruption even further – it’s our role as individuals, as society, and (in my case) as politicians, to formulate and advocate for. We have to embed our values, after democratic debate, into the digital tools (e.g. by making them privacy-preserving). But if we want to have good governance, and to be good at policy-making in the 21st century, we need digital tools, fully understanding their pitfalls and without putting them on a pedestal.

The post Digital Transformation and Technological Utopianism appeared first on Bozho's tech blog.

Every Serialization Framework Should Have Its Own Transient Annotation

2021-06-26 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/every-serialization-framework-should-have-its-own-transient-annotation/

We’ve all used dozens of serialization frameworks – for JSON, XML, binary, and ORMs (which are effectively serialization frameworks for relational databases). And there’s always the moment when you need to exclude some field from an object – make it “transient”.

So far so good, but then comes the point where one object is used by several serialization frameworks within the same project/runtime. That’s not necessarily the case, but let me discuss the two alternatives first:

Use the same object for all serializations (JSON/XML for APIs, binary serialization for internal archiving, ORM/database) – preferred if there are only minor differences between the serialized/persisted fields. Using the same object saves a lot of tedious transferring between DTOs.
Use different DTOs for different serializations – that becomes a necessity when scenarios become more complex and using the same object becomes a patchwork of customizations and exceptions

Note that both strategies can exist within the same project – there are simple objects and complex objects, and you can only have a variety of DTOs for the latter. But let’s discuss the first option.

If each serialization framework has its own “transient” annotation, it’s easy to tweak the serialization of one or two fields. More importantly, it will have predictable behavior. If not, then you may be forced to have separate DTOs even for classes where one field differs in behavior across the serialization targets.

For example the other day I had the following surprise – we use Java binary serialization (ObjectOutputStream) for some internal buffering of large collections, and the objects are then indexed. In a completely separate part of the application, objects of the same class get indexed with additional properties that are irrelevant for the binary serialization and therefore marked with the Java transient modifier. It turns out, GSON respects the “transient” modifier and these fields are never indexed.

In conclusion, this post has two points. The first is – expect any behavior from serialization frameworks and have tests to verify different serialization scenarios. And the second is for framework designers – don’t reuse transient modifiers/annotations from the language itself or from other frameworks, it’s counterintuitive.

The post Every Serialization Framework Should Have Its Own Transient Annotation appeared first on Bozho's tech blog.

A Developer Running For Parliament

2021-06-15 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/a-developer-running-for-parliament/

That won’t be a typical publication you’d see on a developer’s blog. But yes, I’m running for parliament (in my country, Bulgaria, an EU member). And judging by the current polls for the party I’m with, I’ll make it.

But why? Well, I’ll refer to four previous posts in this blog to illustrate my decision.

First, I used to be a government advisor 4 years ago. So the “ship of public service” has sailed. What I didn’t realize back then was that in order to drive sustainable change in the digital realm of the public sector, you need to have a political debate about the importance and goals of those changes, not merely “ghost-writing” them.

A great strategy and a great law and even a great IT system is useless without the mental uptake by a sufficient amount of people. So, that’s the reason one has to be on the forefront of political debate in order to make sure digital transformation is done right. And this forefront is parliament. I’m happy to have supported my party as an expert for the past four years and that expertise is valued. That’s the biggest argument here – you need people like me, with deep technical knowledge and experience in many IT projects, to get things done right on every level. That’s certainly not a one-man task, though.

Second, it’s a challenge. I once wrong “What is challenging for developers” and the last point is “open ended problems”. Digitally transforming an entire country is certainly a challenge in the last category – “open ended problems”. There is no recipe, no manual for that.

Third, lawmaking is quite like programming (except it doesn’t regulate computer behavior, it regulates public life, which is far more complex and important). I already have a decent lawmaking experience and writing better, more precise and more “digital-friendly” laws is something that I like doing and something that I see as important.

Fourth, ethics has been important for me as a developer and it’s much more important for a politician.

For this blog it means I will be writing a bit more high-level stuff than day-to-day tips and advice. I hope I’ll still be able to (and sometimes have to) write some code in order to solve problems, but that won’t be enough material for blogposts. But I’ll surely share thoughts on cybersecurity, quality of public sector projects and system integration.

Software engineering and politics require very different skills. I think I am a good engineer (and I hope to remain so), and I have been a manager and a founder in the last couple of years as well. I’ve slowly, over time, developed my communication skills. National politics, even in a small country, is a tough feat, though. But as engineers we are constantly expanding our knowledge and skills, so I’ll try to transfer that mindset into a new realm.

The post A Developer Running For Parliament appeared first on Bozho's tech blog.

The Syslog Hell

2021-05-10 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/the-syslog-hell/

Syslog. You’ve probably heard about that, especially if you are into monitoring or security. Syslog is perceived to be the common, unified way that systems can send logs to other systems. Linux supports syslog, many network and security appliances support syslog as a way to share their logs. On the other side, a syslog server is receiving all syslog messages. It sounds great in theory – having a simple, common way to represent logs messages and send them across systems.

Reality can’t be further from that. Syslog is not one thing – there are multiple “standards”, and each of those is implemented incorrectly more often than not. Many vendors have their own way of representing data, and it’s all a big mess.

First, the RFCs. There are two RFCs – RFC3164 (“old” or “BSD” syslog) and RFC5424 (the new variant that obsoletes 3164). RFC3164 is not a standard, while RFC5424 is (mostly).

Those RFCs concern the contents of a syslog message. Then there’s RFC6587 which is about transmitting a syslog message over TCP. It’s also not a standard, but rather “an observation”. Syslog is usually transmitted over UDP, so fitting it into TCP requires some extra considerations. Now add TLS on top of that as well.

Then there are content formats. RFC5424 defines a key-value structure, but RFC 3164 does not – everything after the syslog header is just a non-structured message string. So many custom formats exist. For example firewall vendors tend to define their own message formats. At least they are often documented (e.g. check WatchGuard and SonicWall), but parsing them requires a lot of custom knowledge about that vendor’s choices. Sometimes the documentation doesn’t fully reflect the reality, though.

Instead of vendor-specific formats, there are also de-facto standards like CEF and the less popular LEEF. They define a structure of the message and are actually syslog-independent (you can write CEF/LEEF to a file). But when syslog is used for transmitting CEF/LEEF, the message should respect RFC3164.

And now comes the “fun” part – incorrect implementations. Many vendors don’t really respect those documents. They come up with their own variations of even the simplest things like a syslog header. Date formats are all over the place, hosts are sometimes missing, priority is sometimes missing, non-host identifiers are used in place of hosts, colons are placed frivolously.

Parsing all of that mess is extremely “hacky”, with tons of regexes trying to account for all vendor quirks. I’m working on a SIEM, and our collector is open source – you can check our syslog package. Some vendor-specific parsers are yet missing, but we are adding new ones constantly. The date formats in the CEF parser tell a good story.

If it were just two RFCs with one de-facto message format standard for one of them and a few option for TCP/UDP transmission, that would be fine. But what makes things hell is the fact that too many vendors decided not to care about what is in the RFCs, they decided that “hey, putting a year there is just fine” even though the RFC says “no”, that they don’t really need to set a host in the header, and that they didn’t really need to implement anything new after their initial legacy stuff was created.

Too many vendors (of various security and non-security software) came up with their own way of essentially representing key-value pairs, too many vendors thought their date format is the right one, too many vendors didn’t take the time to upgrade their logging facility in the past 12 years.

Unfortunately that’s representative of our industry (yes, xkcd). Someone somewhere stitches something together and then decades later we have an incomprehensible patchwork of stringly-typed, randomly formatted stuff flying around whatever socket it finds suitable. And it’s never the right time and the right priority to clean things up, to get up to date, to align with others in the field. We, as an industry (both security and IT in general) are creating a mess out of everything. Yes, the world is complex, and technology is complex as well. Our job is to make it all palpable, abstracted away, simplified and standardized. And we are doing the opposite.

The post The Syslog Hell appeared first on Bozho's tech blog.

Developers Are Obsessed With Their Text Editors

2021-05-01 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/developers-are-obsessed-with-their-text-editors/

Developers are constantly discussing and even fighting about text editors and IDEs. Which one is better, why is it better, what’s the philosophy behind one or the other, which one makes you more productive, which one has better themes, which one is more customizable.

I myself have fallen victim to this trend, with several articles about why Emacs is not a good idea for Java, why I still use Eclipse (though I’d still prefer some IDEA features), and what’s the difference between an editor and an IDE (for those who’d complain about the imprecise title of this post).

Are text editors and IDEs important? Sure, they are one of our main tools that we use everyday and therefore it should be very, very good (more metaphors about violin players and tennis players, please). But most text editors and IDEs are good. They evolve, they copy each other, they attract their audiences. They are also good in different ways, but most of the top ones achieve their goal (otherwise they wouldn’t be so popular). Sure, someone prefers a certain feature to be implemented in a certain way, or demands having another feature (e.g. I demand having call hierarchies on all constructors and IDEA doesn’t give me that, duh…) But those things are rarely significant in the grand scheme of things.

The comparable insignificance comes from the structure of our work, or why we are being now often called “software engineers” – it’s not about typing speed, or the perfectly optimized tool for creating code. Our time is dedicated to thinking, designing, reading, naming things. And the quality of our code writing/editing/debugging tool is not on the top of the list of things that drive productivity and quality.

We should absolutely master our tools, though. Creating software requires much more than editing text. Refactoring, advanced search, advanced code navigation, debugging, hot-swap/hot-deploy/reload-on-save, version control integration – all of these things are important for doing our job better.

My point is that text editors or IDEs occupy too much of developers’ time and mind, with too little benefit. Next time you think it’s a good idea to argue about which editor/IDE a colleague SHOULD be using, think twice. It’s not a good investment of your time and energy. And next time you consider standardizing on an editor/IDE for the whole team, don’t. Leave people with their preference, it doesn’t affect team consistency.

The post Developers Are Obsessed With Their Text Editors appeared first on Bozho's tech blog.

Releasing Often Helps With Analyzing Performance Issues

2020-12-26 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/releasing-often-helps-with-analyzing-performance-issues/

Releasing often is a good thing. It’s cool, and helps us deliver new functionality quickly, but I want to share one positive side-effect – it helps with analyzing production performance issues.

We do releases every 5 to 10 days and after a recent release, the application CPU chart jumped twice (the lines are differently colored because we use blue-green deployment):

What are the typical ways to find performance issues with production loads?

Connect a profiler directly to production – tricky, as it requires managing network permissions and might introduce unwanted overhead
Run performance tests against a staging or local environment and do profiling there – good, except your performance tests might not hit exactly the functionality that causes the problem (this is what happens in our case, as it was some particular types of API calls that caused it, which weren’t present in our performance tests). Also, performance tests can be tricky
Do a thread dump (and heap dump) and analyze them locally – a good step, but requires some luck and a lot of experience analyzing dumps, even if equipped with the right tools
Check your git history / release notes for what change might have caused it – this is what helped us resolve the issue. And it was possible because there were only 10 days of commits between the releases.

We could go through all of the commits and spot potential performance issues. Most of them turned out not to be a problem, and one seemingly unproblematic pieces was discovered to be the problem after commenting it out for a brief period a deploying a quick release without it, to test the hypothesis. I’ll share a separate post about the particular issue, but we would have to waste a lot more time if that release has 3 months worth of commits rather than 10 days.

Sometimes it’s not an obvious spike in the CPU or memory, but a more gradual issue that you introduce at some point and it starts being a problem a few months later. That’s what happened a few months ago, when we noticed a stead growth in the CPU with the growth of ingested data. Logical in theory, but the CPU usage grew faster than the data ingestion rate, which isn’t good.

So we were able to answer the question “when did it start growing” in order to be able to pinpoint the release that introduced the issue. because the said release had only 5 days of commits, it was much easier to find the culprit.

All of the above techniques are useful and should be employed at the right time. But releasing often gives you a hand with analyzing where a performance issues is coming from.

The post Releasing Often Helps With Analyzing Performance Issues appeared first on Bozho's tech blog.

Let’s Kill Security Questions

2020-11-20 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/lets-kill-security-questions/

Let’s kill security questions

Security questions still exist. They are less dominant now, but we haven’t yet condemned them as an industry hard enough so that they stop being added to authentication flows.

But they are bad. They are like passwords, but more easily guessable, because you have a password hint. And while there are opinions that they might be okay in certain scenarios, they have so many pitfalls that in practice we should just not consider them an option.

What are those pitfalls? Social engineering. Almost any security question’s answer is guessable by doing research on the target person online. We share more about our lives and don’t even realize how that affects us security-wise. Many security questions have a limited set of possible answers that can be enumerated with a brute force attack (e.g. what are the most common pet names; what are the most common last names in a given country for a given period of time, in order to guess someone’s mother’s maiden name; what are the high schools in the area where the person lives, and so on). So when someone wants to takeover your account, if all they have to do is open your Facebook profile or try 20-30 options, you have no protection.

But what are they for in the first place? Account recovery. You have forgotten your password and the system asks you some details about you to allow you to reset your password. We already have largely solved the problem of account recovery – send a reset password link to the email of the user. If the system itself is an email service, or in a couple of other scenarios, you can use a phone number, where a one-time password is sent for recovery purposes (or a secondary email, for email providers).

So we have the account recovery problem largely solved, why are security questions still around? Inertia, I guess. And the five monkeys experiment. There is no good reason to have a security question if you can have recovery email or phone. And you can safely consider that to be true (ok, maybe there are edge cases).

There are certain types of account recovery measures that resemble security questions and can be implemented as an additional layer, on top of a phone or email recovery. For more important services (e.g. your Facebook account or your main email), it may not be safe to consider just owning the phone or just having access to the associated email to be enough. Phones get stolen, emails get “broken into”. That’s why a security-like set of questions may serve as additional protection. For example – guessing recent activity. Facebook does that sometimes by asking you about your activity on the site or about your friends. This is not perfect, as it can be monitored by the malicious actor, but is an option. For your email, you can be asked what are the most recent emails that you’ve sent, and be presented with options to choose from, with some made up examples. These things are hard to implement because of geographic and language differences, but “guess your recent activity among these choices”, e.g. dynamically defined security questions, may be an acceptable additional step for account recovery.

But fixed security questions – no. Let’s kill those. I’m not the first to argue against security questions, but we need to be reminded that certain bad security practices should be left in the past.

Authentication is changing. We are desperately trying to get rid of the password itself (and still failing to do so), but before we manage to do so, we should first get rid of the “bad password in disguise”, the security question.

The post Let’s Kill Security Questions appeared first on Bozho's tech blog.

Discovering an OSSEC/Wazuh Encryption Issue

2020-10-12 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/discovering-an-ossec-wazuh-encryption-issue/

I’m trying to get the Wazuh agent (a fork of OSSEC, one of the most popular open source security tools, used for intrusion detection) to talk to our custom backend (namely, our LogSentinel SIEM Collector) to allow us to reuse the powerful Wazuh/OSSEC functionalities for customers that want to install an agent on each endpoint rather than just one collector that “agentlessly” reaches out to multiple sources.

But even though there’s a good documentation on the message format and encryption, I couldn’t get to successfully decrypt the messages. (I’ll refer to both Wazuh and OSSEC, as the functionality is almost identical in both, with the distinction that Wazuh added AES support in addition to blowfish)

That lead me to a two-day investigation on possible reasons. The first side-discovery was the undocumented OpenSSL auto-padding of keys and IVs described in my previous article. Then it lead me to actually writing C code (an copying the relevant Wazuh/OSSEC pieces) in order to debug the issue. With Wazuh/OSSEC I was generating one ciphertext and with Java and openssl CLI – a different one.

I made sure the key, key size, IV and mode (CBC) are identical. That they are equally padded and that OpenSSL’s EVP API is correctly used. All of that was confirmed and yet there was a mismatch, and therefore I could not decrypt the Wazuh/OSSEC message on the other end.

After discovering the 0-padding, I also discovered a mistake in the documentation, which used a static IV of FEDCA9876543210 rather than the one found in the code, where the 0 preceded 9 – FEDCA0987654321. But that didn’t fix the issue either, only got me one step closer.

A side-note here on IVs – Wazuh/OSSEC is using a static IV, which is a bad practice. The issue is reported 5 years ago, but is minor, because they are using some additional randomness per message that remediates the use of a static IV; it’s just not idiomatic to do it that way and may have unexpected side-effects.

So, after debugging the C code, I got to a simple code that could be used to reproduce the issue and asked a question on Stackoverflow. 5 minutes after posting the question I found another, related question that had the answer – using hex strings like that in C doesn’t work. Instead, they should be encoded: char *iv = (char *)"\xFE\xDC\xBA\x09\x87\x65\x43\x21\x00\x00\x00\x00\x00\x00\x00\x00";. So, the value is not the bytes corresponding to the hex string, but the ASCII codes of each character in the hex string. I validated that in the receiving Java end with this code:

This has an implication on the documentation, as well as on the whole scheme as well. Because the Wazuh/OSSEC AES key is: MD5(password) + MD5(MD5(agentName) + MD5(agentID)){0, 15}, the 2nd part is practically discarded, because the MD5(password) is 32 characters (= 32 ASCII codes/bytes), which is the length of the AES key. This makes the key derived from a significantly smaller pool of options – the permutations of 16 bytes, rather than of 256 bytes.

I raised an issue with Wazuh. Although this can be seen as a vulnerability (due to the reduced key space), it’s rather minor from security point of view, and as communication is mostly happening within the corporate network, I don’t think it has to be privately reported and fixed immediately.

Yet, I made a recommendation for introducing an additional configuration option to allow to transition to the updated protocol without causing backward compatibility issues. In fact, I’d go further and recommend using TLS/DTLS rather than a home-grown, AES-based scheme. Mutual authentication can be achieved through TLS mutual authentication rather than through a shared secret.

It’s satisfying to discover issues in popular software, especially when they are not written in your “native” programming language. And as a rule of thumb – encodings often cause problems, so we should be extra careful with them.

The post Discovering an OSSEC/Wazuh Encryption Issue appeared first on Bozho's tech blog.

Is It Really Two-Factor Authentication?

2020-09-24 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/is-it-really-two-factor-authentication/

Terminology-wise, there is a clear distinction between two-factor authentication (multi-factor authentication) and two-step verification (authentication), as this article explains. 2FA/MFA is authentication using more than one factors, i.e. “something you know” (password), “something you have” (token, card) and “something you are” (biometrics). Two-step verification is basically using two passwords – one permanent and another one that is short-lived and one-time.

At least that’s the theory. In practice it’s more complicated to say which authentication methods belongs to which category (“something you X”). Let me illustrate that with a few emamples:

An OTP hardware token is considered “something you have”. But it uses a shared symmetric secret with the server so that both can generate the same code at the same time (if using TOTP), or the same sequence. This means the secret is effectively “something you know”, because someone may steal it from the server, even though the hardware token is protected. Unless, of course, the server stores the shared secret in an HSM and does the OTP comparison on the HSM itself (some support that). And there’s still a theoretical possibility for the keys to leak prior to being stored on hardware. So is a hardware token “something you have” or “something you know”? For practical purposes it can be considered “something you have”
Smartphone OTP is often not considered as secure as a hardware token, but it should be, due to the secure storage of modern phones. The secret is shared once during enrollment (usually with on-screen scanning), so it should be “something you have” as much as a hardware token
SMS is not considered secure and often given as an example for 2-step verification, because it’s just another password. While that’s true, this is because of a particular SS7 vulnerability (allowing the interception of mobile communication). If mobile communication standards were secure, the SIM card would be tied to the number and only the SIM card holder would be able to receive the message, making it “something you have”. But with the known vulnerabilities, it is “something you know”, and that something is actually the phone number.
Fingerprint scanners represent “something you are”. And in most devices they are built in a way that the scanner authenticates to the phone (being cryptographically bound to the CPU) while transmitting the fingerprint data, so you can’t just intercept the bytes transferred and then replay them. That’s the theory; it’s not publicly documented how it’s implemented. But if it were not so, then “something you are” is “something you have” – a sequence of bytes representing your fingerprint scan, and that can leak. This is precisely why biometric identification should only be done locally, on the phone, without any server interaction – you can’t make sure the server is receiving sensor-scanned data or captured and replayed data. That said, biometric factors are tied to the proper implementation of the authenticating smartphone application – if your, say, banking application needs a fingerprint scan to run, a malicious actor should not be able to bypass that by stealing shared credentials (userIDs, secrets) and do API calls to your service. So to the server there’s no “something you are”. It’s always “something that the client-side application has verified that you are, if implemented properly”
A digital signature (via a smartcard or yubikey or even a smartphone with secure hardware storage for private keys) is “something you have” – it works by signing one-time challenges, sent by the server and verifying that the signature has been created by the private key associated with the previously enrolled public key. Knowing the public key gives you nothing, because of how public-key cryptography works. There’s no shared secret and no intermediary whose data flow can be intercepted. A private key is still “something you know”, but by putting it in hardware it becomes “something you have”, i.e. a true second factor. Of course, until someone finds out that the random generation of primes used for generating the private key has been broken and you can derive the private key form the public key (as happened recently with one vendor).

There isn’t an obvious boundary between theoretical and practical. “Something you are” and “something you have” can eventually be turned into “something you know” (or “something someone stores”). Some theoretical attacks can become very practical overnight.

I’d suggest we stick to calling everything “two-factor authentication”, because it’s more important to have mass understanding of the usefulness of the technique than to nitpick on the terminology. 2FA does not solve phishing, unfortunately, but it solves leaked credentials, which is good enough and everyone should have some form of it. Even SMS is better than nothing (obviously, for high-profile systems, digital signatures is the way to go).

The post Is It Really Two-Factor Authentication? appeared first on Bozho's tech blog.

Noise