Tag Archives: Opinions

Discovering an OSSEC/Wazuh Encryption Issue

Post Syndicated from Bozho original https://techblog.bozho.net/discovering-an-ossec-wazuh-encryption-issue/

I’m trying to get the Wazuh agent (a fork of OSSEC, one of the most popular open source security tools, used for intrusion detection) to talk to our custom backend (namely, our LogSentinel SIEM Collector) to allow us to reuse the powerful Wazuh/OSSEC functionalities for customers that want to install an agent on each endpoint rather than just one collector that “agentlessly” reaches out to multiple sources.

But even though there’s a good documentation on the message format and encryption, I couldn’t get to successfully decrypt the messages. (I’ll refer to both Wazuh and OSSEC, as the functionality is almost identical in both, with the distinction that Wazuh added AES support in addition to blowfish)

That lead me to a two-day investigation on possible reasons. The first side-discovery was the undocumented OpenSSL auto-padding of keys and IVs described in my previous article. Then it lead me to actually writing C code (an copying the relevant Wazuh/OSSEC pieces) in order to debug the issue. With Wazuh/OSSEC I was generating one ciphertext and with Java and openssl CLI – a different one.

I made sure the key, key size, IV and mode (CBC) are identical. That they are equally padded and that OpenSSL’s EVP API is correctly used. All of that was confirmed and yet there was a mismatch, and therefore I could not decrypt the Wazuh/OSSEC message on the other end.

After discovering the 0-padding, I also discovered a mistake in the documentation, which used a static IV of FEDCA9876543210 rather than the one found in the code, where the 0 preceded 9 – FEDCA0987654321. But that didn’t fix the issue either, only got me one step closer.

A side-note here on IVs – Wazuh/OSSEC is using a static IV, which is a bad practice. The issue is reported 5 years ago, but is minor, because they are using some additional randomness per message that remediates the use of a static IV; it’s just not idiomatic to do it that way and may have unexpected side-effects.

So, after debugging the C code, I got to a simple code that could be used to reproduce the issue and asked a question on Stackoverflow. 5 minutes after posting the question I found another, related question that had the answer – using hex strings like that in C doesn’t work. Instead, they should be encoded: char *iv = (char *)"\xFE\xDC\xBA\x09\x87\x65\x43\x21\x00\x00\x00\x00\x00\x00\x00\x00";. So, the value is not the bytes corresponding to the hex string, but the ASCII codes of each character in the hex string. I validated that in the receiving Java end with this code:

This has an implication on the documentation, as well as on the whole scheme as well. Because the Wazuh/OSSEC AES key is: MD5(password) + MD5(MD5(agentName) + MD5(agentID)){0, 15}, the 2nd part is practically discarded, because the MD5(password) is 32 characters (= 32 ASCII codes/bytes), which is the length of the AES key. This makes the key derived from a significantly smaller pool of options – the permutations of 16 bytes, rather than of 256 bytes.

I raised an issue with Wazuh. Although this can be seen as a vulnerability (due to the reduced key space), it’s rather minor from security point of view, and as communication is mostly happening within the corporate network, I don’t think it has to be privately reported and fixed immediately.

Yet, I made a recommendation for introducing an additional configuration option to allow to transition to the updated protocol without causing backward compatibility issues. In fact, I’d go further and recommend using TLS/DTLS rather than a home-grown, AES-based scheme. Mutual authentication can be achieved through TLS mutual authentication rather than through a shared secret.

It’s satisfying to discover issues in popular software, especially when they are not written in your “native” programming language. And as a rule of thumb – encodings often cause problems, so we should be extra careful with them.

The post Discovering an OSSEC/Wazuh Encryption Issue appeared first on Bozho's tech blog.

Is It Really Two-Factor Authentication?

Post Syndicated from Bozho original https://techblog.bozho.net/is-it-really-two-factor-authentication/

Terminology-wise, there is a clear distinction between two-factor authentication (multi-factor authentication) and two-step verification (authentication), as this article explains. 2FA/MFA is authentication using more than one factors, i.e. “something you know” (password), “something you have” (token, card) and “something you are” (biometrics). Two-step verification is basically using two passwords – one permanent and another one that is short-lived and one-time.

At least that’s the theory. In practice it’s more complicated to say which authentication methods belongs to which category (“something you X”). Let me illustrate that with a few emamples:

  • An OTP hardware token is considered “something you have”. But it uses a shared symmetric secret with the server so that both can generate the same code at the same time (if using TOTP), or the same sequence. This means the secret is effectively “something you know”, because someone may steal it from the server, even though the hardware token is protected. Unless, of course, the server stores the shared secret in an HSM and does the OTP comparison on the HSM itself (some support that). And there’s still a theoretical possibility for the keys to leak prior to being stored on hardware. So is a hardware token “something you have” or “something you know”? For practical purposes it can be considered “something you have”
  • Smartphone OTP is often not considered as secure as a hardware token, but it should be, due to the secure storage of modern phones. The secret is shared once during enrollment (usually with on-screen scanning), so it should be “something you have” as much as a hardware token
  • SMS is not considered secure and often given as an example for 2-step verification, because it’s just another password. While that’s true, this is because of a particular SS7 vulnerability (allowing the interception of mobile communication). If mobile communication standards were secure, the SIM card would be tied to the number and only the SIM card holder would be able to receive the message, making it “something you have”. But with the known vulnerabilities, it is “something you know”, and that something is actually the phone number.
  • Fingerprint scanners represent “something you are”. And in most devices they are built in a way that the scanner authenticates to the phone (being cryptographically bound to the CPU) while transmitting the fingerprint data, so you can’t just intercept the bytes transferred and then replay them. That’s the theory; it’s not publicly documented how it’s implemented. But if it were not so, then “something you are” is “something you have” – a sequence of bytes representing your fingerprint scan, and that can leak. This is precisely why biometric identification should only be done locally, on the phone, without any server interaction – you can’t make sure the server is receiving sensor-scanned data or captured and replayed data. That said, biometric factors are tied to the proper implementation of the authenticating smartphone application – if your, say, banking application needs a fingerprint scan to run, a malicious actor should not be able to bypass that by stealing shared credentials (userIDs, secrets) and do API calls to your service. So to the server there’s no “something you are”. It’s always “something that the client-side application has verified that you are, if implemented properly”
  • A digital signature (via a smartcard or yubikey or even a smartphone with secure hardware storage for private keys) is “something you have” – it works by signing one-time challenges, sent by the server and verifying that the signature has been created by the private key associated with the previously enrolled public key. Knowing the public key gives you nothing, because of how public-key cryptography works. There’s no shared secret and no intermediary whose data flow can be intercepted. A private key is still “something you know”, but by putting it in hardware it becomes “something you have”, i.e. a true second factor. Of course, until someone finds out that the random generation of primes used for generating the private key has been broken and you can derive the private key form the public key (as happened recently with one vendor).

There isn’t an obvious boundary between theoretical and practical. “Something you are” and “something you have” can eventually be turned into “something you know” (or “something someone stores”). Some theoretical attacks can become very practical overnight.

I’d suggest we stick to calling everything “two-factor authentication”, because it’s more important to have mass understanding of the usefulness of the technique than to nitpick on the terminology. 2FA does not solve phishing, unfortunately, but it solves leaked credentials, which is good enough and everyone should have some form of it. Even SMS is better than nothing (obviously, for high-profile systems, digital signatures is the way to go).

The post Is It Really Two-Factor Authentication? appeared first on Bozho's tech blog.

Making Sense of the Information Security Landscape

Post Syndicated from Bozho original https://techblog.bozho.net/making-sense-of-the-information-security-landscape/

There are hundreds of different information security solutions out there and choosing which one to pick can be hard. Usually decisions are driven by recommendations, vendor familiarity, successful upsells, compliance needs, etc. I’d like to share my understanding of the security landscape by providing one-line descriptions of each of the different categories of products.

Note that these categories are not strictly defined sometimes and they may overlap. They may have evolved over time and a certain category can include several products from legacy categories. The explanations will be slightly simplified. For a generalization and summary, skip the list and go to the next paragraph. This post aims to summarize a lot of Gertner and Forester reports, as well as product data sheets, combined with some real world observations and to bring this to a technical level, rather than broad business-focused capabilities. I’ll split them in several groups, though they may be overlapping.

Monitoring and auditing

  • SIEM (Security Information and Event Management) – collects logs from all possible sources (applications, OSs, network appliances) and raises alarms if there are anomalies
  • IDS (Intrusion Detection System) – listening to network packets and finding malicious signatures or statistical anomalies. There are multiple ways to listen to the traffic: proxy, port mirroring, network tap, host-based interface listener. Deep packet inspection is sometimes involved, which requires sniffing TLS at the host or terminating it at a proxy in order to be able to inspect encrypted communication (especially for TLS 1.3), effectively doing an MITM “attack” on the organization users.
  • IPS (Intrusion Prevention System) – basically a marketing upgrade of IDS with the added option to “block” traffic rather than just “report” the intrusion.
  • UEBA (User and Entity Behavior Analytics) – a system that listens to system activity (via logs and/or directly monitoring endpoints for user and system activity, including via screen capture) that tries to identify user behavior patterns (as well as system component behavior patterns) and report on any anomalies and changes in the pattern, also classifying users as less or more “risky”. Recently UEBA has been part of next-gen SIEMs
  • SUBA (Security user Behavior Analytics) – same as UEBA, but named so after the purpose (security) rather than the entities monitored. Used by Forester (whereas UEBA is used by Gartner)
  • DAM (Database Activity Monitoring) – tools that monitor and log database queries and configuration changes, looking for suspicious patterns and potentially blocking them based on policies. Implemented via proxy or agents installed at the host
  • DAP (Database Audit and Protection) – based on DAM, but with added features for content classification (similar to DLPs), vulnerability detection and more clever behavior analysis (e.g. through UEBA)
  • FIM (File Integrity Monitoring) – usually a feature of other tools, FIM is constantly monitoring files for potentially suspicious changes
  • SOC (Security Operations Center) – this is more of an organizational unit that employs multiple tools (usually a SIEM, DLP, CASB) to fully handle the security of an organization.

Access proxies

  • CASB (Cloud Access Security Broker) – a proxy (usually) that organizations go through when connecting to cloud services that allow them to enforce security policies and detect anomalies, e.g. regarding authentication and authorization, input and retrieval of sensitive data. CASBs may involve additional encryption options for the data being used.
  • CSG (Cloud Security Gateway) – effectively the same as CASB
  • SWG (Secure Web Gateway) – a proxy for accessing the web, includes filtering malicious websites, filtering potentially malicious downloads, limiting uploads
  • SASE (Secure Access Service Edge) – like CASB/CSG, but also providing additional bundled functionalities like a Firewall, SWG, VPN, DNS management, etc.

Firewalls

  • WAF (Web Application Firewall) – a firewall (working as a reverse proxy) that you put in front of web applications to protect them from typical web vulnerabilities that may not be addressed by the application developer – SQL injections, XSS, CSRF, etc.
  • NF (Network Firewall) – the typical firewall that allows you to allow or block traffic based on protocol, port, source/destination
  • NGFW (Next Generation Firewall) – a firewall that combines both network firewall, (web) application firewall and providing analysis of the traffic thus detecting potential anomalies/intrusions/data exfiltration

Data protection

  • DLP (Data Leak Prevention / Data Loss Prevention) – that’s a broad category of tools that aim at preventing data loss – mostly accidental, but sometimes malicious as well. Sometimes involves installing an agent in each machine, in other case it’s proxy-based. Many other solutions provide DLP functionality, like IPS/IDS, WAFs, CASBs, but DLPs are focused on inspecting user activities (including via UEBA/SUBA), network traffic (including via SWGs), communication (most often email) and publicly facing storage (e.g. FTP, S3), that may lead to leaking data. DLPs include discovering sensitive data in structured (databases) and unstructured (office documents) data. Other DLP features are encryption of data at rest and tokenization of sensitive data.
  • ILDP (Information Leak Detection and Prevention) – same as DLP
  • IPC (Information Protection and Control) – same as DLP
  • EPS (Extrusion Prevention System) – same as DLP, focused on monitoring outbound traffic for exfiltration attempts
  • CMF (Content Monitoring and Filtering) – part of DLP. May overlap with SWG functionalities.
  • CIP (Critical Information Protection) – part of DLP, focused on critical information, e.g. through encryption and tokenization
  • CDP (Continuous Data Protection) – basically incremental/real-time backup management, with retention settings and possibly encryption

Vulnerability testing

  • RASP (Runtime Application Self-protection) – tools (usually in the form of libraries that are included in the application runtime) that monitor in real-time the application usage and can block certain actions (at binary level) or even shut down the application if a cyber attack is detected.
  • IASTInteractive Application Security Testing – Similar to RASP, the subtle difference being that IASP is usually used in pre-production environments while RASP is used in production
  • SAST (Static Application Security Testing) – tools that scan application source code for vulnerabilities
  • DAST (Dynamic Application Security Testing) – tools that scans web applications for vulnerabilities through their exposed HTTP endpoints
  • VA (Vulnerability assessment) – a process helped by many tools (including those above, and more) for finding, assessing and eliminating vulnerabilities

Identity and access

  • IAM (Identity and Access Management) – products that allow organizations to centralize authentication and enrollment of their users, providing single-sign-on capabilities, centralized monitoring authentication activity, applying access policies (e.g. working hours), enforcing 2FA, etc.
  • SSO – the ability to use the same credentials for logging into multiple (preferably all) applications in an organization.
  • WAM (Web Access Management) – the “older” version of IAM, lacking flexibility and some features like centralized user enrollment/provisioning
  • PAM (Privileged access management) – managing credentials of privileged users (e.g. system administrators). Instead of having admin credentials stored in local password managers (or worse – sticky notes or files on the desktop), credentials are stored in a centralized, protected vault and “released” for use only after a certain approval process for executing a given admin task, in some cases monitoring and logging the executed activities. The PAM handles regular password changes. It basically acts as a proxy (though not necessarily in the network sense) between a privileged user and a system that requires elevated privileges.

Endpoint protection

  • AV (Anti-Virus) – the good old antivirus software that gets malicious software signatures form a centrally managed blacklist and blocks programs that match those signatures
  • NGAV (Next Generation Anti-Virus) – going beyond signature matching, NGAV looks for suspicious activities (e.g. filesystem, memory, registry access/modification) and uses policies and rules to block such activity even from previously unknown and not yet blacklisted programs. Machine learning is usually said to be employed, but in many cases that’s mostly marketing.
  • EPP (Endpoint Protection Platform) – includes NGAV as well as a management layer that allows centrally provisioning and managing policies, reporting and workflows for remediation
  • EDR (Endpoint Detection and Response) – using an agent to collect endpoint (device) data, centralize it, combine it with network logs and analyze that in order to detect malicious activity. After suspected malicious activity is detected, allow centralized response, including blocking/shutting down/etc. Compared to NGAV, EDR makes use of the data across the organization, while NGAV usually focuses on individual machines, but that’s not universally true
  • ATP (Advanced threat protection) – same as EDR
  • ATD (Advanced threat detection) – same as above, with just monitoring and analytics capabilities

Coordination and automation

  • UTM (Unified Threat Management) – combining multiple monitoring and prevention tools in one suite (antivirus/NGAV/EDR), DLP, Firewalls, VPNs, etc. The benefit being that you purchase one thing rather than finding your way through the jungle described above. At least that’s on paper; in reality you still get different modules, sometimes not even properly integrated with each other.
  • SOAR (Security Orchestration, Automation and Response) – tools for centralizing security alerts and configuring automated actions in response. Alert fatigue is a real thing with many false positives generated by tools like SIEMs/DLPs/EDRs. Reducing those false alarms is often harder than just scripting the way they are handled. SOAR provides that – it ingests alerts and allows you to use pre-built or custom response “cookbooks” that include checking data (e.g. whether an IP is in some blacklist, are there attachments of certain content type in a flagged email, whether an employee is on holiday, etc.), creating tickets and alerting via multiple channels (email/sms/other type of push)
  • TIP (Threat Intelligence Platform) – threat intelligence is often part of other solutions like SIEMs, EDRs and DLPs and involves collecting information (intelligence) about certain resources like IP addresses, domain names, certificates. When these items are discovered in the collected logs, the TIP can enrich the event with what it knows about the given item and even act in order to block a request, if a threat threshold is reached. In short – scanning public and private databases to find information about malicious actors and their assets.

Email

  • SEG (Secure email gateway) – a proxy for all incoming and outgoing email that scans them for malicious attachments, potential phishing and in some cases data exfiltration attempts.
  • MFT (Managed File Transfer) – a tool that allows sharing files securely with someone by replacing attachments. Shared files can be tracked, monitored, audited and scanned for vulnerabilities, and access can be cut once the files was downloaded by the recipient, reducing the risk of data leaks.

DDoS

  • DDoS mitigation/protection – services that hide your actual IP in an attempt to block malicious DDoS traffic before it reaches your network (when it’s too late). They usually rely on large global networks an data centers (called “scrubbing centers”) to send clean traffic to your servers.

Compliance

  • GRC (Governance, Risk and Compliance) – a management tool for handling all the policies, audits, risk assessments, workflows and reports regarding different aspects of compliance, including security compliance
  • IRM – allegedly, philosophically different and more modern and advanced, in reality – the same as GRC with some additional monitoring features

So let’s summarize the ways that all of these solutions work:

  • Monitoring logs and other events
  • Inspecting incoming traffic and finding malicious activities
  • Inspecting outgoing traffic and applying policies
  • Application vulnerability detection
  • Automating certain aspects of the alerting, investigation and response handling

Monitoring (which is central to most tools) is usually done via proxies, port mirroring, network taps or host-based interface listeners, each having its pros and cons. Enforcement is almost always done via proxies. Bypassing these proxies should not be possible, but for cloud services you can’t really block access if the service is accessed outside your corporate environment (unless the SaaS provider has an IP whitelist feature).

In most cases, even though machine learning/AI is advertised as “the new thing”, tools make decisions based on configured policies (rules). Organizations are drowned in complex policies that they should keep up to date and syncrhonize across tools. Policy management, especially given there’s no indsutry standard for how policies should be defined, is a huge burden. In theory, it gives flexibility and should be there, in practice it may lead to a messy and hard to manage environment.

Monitoring is universally seen as the way to receive actionable intelligence from systems. This is much messier in reality than in demos and often leads to systems being left unmonitored and alerts being ignored. Alert fatigue, which follows from the complexity of policy management, is a bug problem in information security. SOAR is a way to remedy that but it sounds like a band-aid on a broken process rather than a true solution – false alarms should be reduced rather than being closed quasi-automatically. If handling an alert is automatable, then tha tool that generates it should be able to know it’s not a real problem.

The complexity of the security landscape is obviously huge – product categories are defined based on multiple criteria – what problem they solve, how they solve it, or to what extent they solve it. Is a SIEM also a DLP if it uses UEBA to block certain traffic (next-gen SIEMs may be able to invoke blocking actions even if requiring another system to carry it out). Is a DLP a CASB if it does encryption of data that’s stored in cloud services? Should you have an EPP and a SIEM, if the EPP gives you good enough overview of the events being logged in your infrastructure? Is a CASB a WAF for SaaS? Is a SIEM a DAM if it supports native database audit logs? You can’t answer these questions at a category level, you have to look at particular products and how well they implement a certain feature.

Can you have a unified proxy (THE proxy) that monitors everything incoming and outgoing and collects that data, acting as WAF, DLP, SIEM, CASB, SEG? Can you have just one agent that is both a EDR, and a DLP? Well, certainly categories like SASE and UTM go in that direction, trying to ease the decision making process.

I think it’s most important to start from the attack targets, rather than from the means to get there or from the means to prevent getting there. Unfortunately, enterprise security is often driven by “I need to have this type of product”. This leads to semi-abandoned and partially configured tools for which organizations pay millions. Because there is never enough people to be able to go into the intricate details of yet another security soluion, and organizations rely on consultants to set things up.

I don’t have solutions to the problems stated above, but I hope I’ve given a good overview of the landscape. And I think we should focus less on “security products” and more on “security techniques” and on people that can implement them. You don’t have a billion dollar corporation to sell you a silver bullet (which you can’t fire). You need traind experts. That’s hard. There aren’t enough of them. And the security team is often undervalued in the enterprise. Yes, cybersecurity is very important, but I’m not sure whether this will ever get enough visibility and be prioritized over purely business goals. And maybe it shouldn’t, if risk is properly calculated.

All the products above are ways to buy some feeling of security. If used properly and in the right combination, it can be more than a feeling. But too often a feeling is just good enough.

The post Making Sense of the Information Security Landscape appeared first on Bozho's tech blog.

Encryption Overview [Webinar]

Post Syndicated from Bozho original https://techblog.bozho.net/encryption-overview-webinar/

“Encryption” has turned into a buzzword, especially after privacy standards and regulation vaguely mention it and vendors rush to provide “encryption”. But what does it mean in practice? I did a webinar (hosted by my company, LogSentinel) to explain the various aspects and pitfalls of encryption.

You can register to watch the webinar here, or view it embedded below:

And here are the slides:

Of course, encryption is a huge topic, worth a whole course, rather than just a webinar, but I hope I’m providing good starting points. The interesting technique that we employ in our company is “searchable encryption” which allows to have encrypted data and still search in it. There are many more very nice (and sometimes niche) applications of encryption and cryptography in general, as Bruce Schneier mentions in his recent interview. These applications can solve very specific problems with information security and privacy that we face today. We only need to make them mainstream or at least increase awareness.

The post Encryption Overview [Webinar] appeared first on Bozho's tech blog.

Seven Legacy Integration Patterns

Post Syndicated from Bozho original https://techblog.bozho.net/seven-legacy-integration-patterns/

If we have to integrate two (or more) systems nowadays, we know – we either use an API or, more rarely, some message queue.

Unfortunately, many systems in the world do not support API integration. And many more a being created as we speak, that don’t have APIs. So when you inevitably have to integrate with them, you are left with imperfect choices to make. Below are seven patterns to integrate with legacy systems (or not-so-legacy systems that are built in legacy ways).

Initially I wanted to use “bad integration patterns”. But if you don’t have other options, they are not bad – they are inevitable. What’s bad is that fact that so many systems continue to be built without integration in mind.

I’ve seen all of these, on more than one occasion. And I’ve heard many more stories about them. Unfortunately, they are not the exception (fortunately, they are also not the rule, at least not anymore).

  1. Files on FTP – one application uploads files (XML, CSV, other) to an FTP (or other shared resources) and the other ones reads them via a scheduled job, parses them and optionally spits a response – either in the same FTP, or via email. Sharing files like that is certainly not ideal in terms of integration – you don’t get real-time status of your request, and other aspects are trickier to get right – versioning, high availability, authentication, security, traceability (audit trail).
  2. Shared database – two applications sharing the same database may sound like a recipe for disaster, but it’s not uncommon to see it in the wild. If you are lucky, one application will be read-only. But breaking changes to the database structure and security concerns are major issues. You can only use this type of integration is you expose your database directly to another application, which you normally don’t want to do.
  3. Full daily dump – instead of sharing an active database, some organizations do a full dump of their data every day or week and provide to to the other party for import. Obvious data privacy issues exist with that, as it’s a bad idea to have full dumps of your data flying around (in some cases on DVDs or portable HDDs), in addition to everything mention below – versioning, authentication, etc.
  4. Scraping – when an app has no API, it’s still possible to extract data from it or push data to it – via the user interface. With web applications that’s easier, as they “speak” HTML and HTTP. With desktop apps, screen scraping has emerged as an option. The so-called RPA software (Robotic process automation) relies on all types of scraping to integrate legacy systems. It’s very fragile and requires complicated (and sometimes expensive) tooling to get right. Not to mention the security aspect, which requires storing credentials in non-hashed form somewhere in order to let the scraper login.
  5. Email – when the sending or receiving system don’t support other forms of integration, email comes as a last resort. If you can trigger something by connecting a mailbox or if an email is produced after some event happens in the sending system, this may be all you need to integrate. Obviously, email is a very bad means of integration – it’s unstructured, it can fail for many reasons, and it’s just not meant for software integration. You can attach structured data, if you want to get extra inventive, but if you can get both ends to support the same format, you can probably get them extended to support proper APIs.
  6. Adapters – you can develop a custom module that has access to the underlying database, but exposes a proper API. That’s an almost acceptable solution, as you can have a properly written (sort-of) microservice independent of the original application and other system won’t know they are integrating with a legacy piece of software. It’s tricky to get it right in some cases, however, as you have to understand well the state space of the database. Read-only is easy, writing is much harder or next to impossible.
  7. Paper – no, I’m not making this up. There are cases where one organizations prints some data and then the other organization (or department) receives the paper documents (by mail or otherwise) and inputs them in their system. Expensive projects exist out there that aim to remove the paper component and introduce actual integration, as paper-based input is error-prone and slow. The few legitimate scenarios for a paper-based step is when you need an extra security and the paper trail, combined with the fact that paper is effectively airgapped, may give you that. But even then it shouldn’t be the only transport layer.

If you need to do any of the above, it’s usually because at least one of the system is stuck and can’t be upgraded. It’s either too legacy to touch, or the vendor is gone, or adding an API is “not on their roadmap” and would be too expensive.

If you are building a system, always provide an API. Some other system will have to integrate with it, sooner or later. It’s not sustainable to build close systems and delay the integration question for when it’s needed. Assume it’s always needed.

Fancy ESBs may be able to patch things quickly with one of the approaches above and integrate the “unintegratable”, but heavy reliance on an ESB is an indicator of too many legacy or low-quality systems.

But simply having an API doesn’t cut it either. If you don’t support versioning and backward-compatible APIs, you’ll be in an even more fragile state, as you’ll be breaking existing integrations as you progress.

Enterprise integration is tricky. But, as with many things in software, it’s best handled in the applications that we build. If we build them right, things are much easier. Otherwise, organizations have to revert to the legacy approaches mentioned above and introduce complexity, fragility, security and privacy risks and a general feeling of low-quality that has to be supported by increasingly unhappy people.

The post Seven Legacy Integration Patterns appeared first on Bozho's tech blog.

Low-Code, Rapid Application Development and Digital Transformation

Post Syndicated from Bozho original https://techblog.bozho.net/low-code-rapid-application-development-and-digital-transformation/

Recently, many low-code/no-code solutions have gained speed in the enterprise, giving non-technical people the option to create simple applications. Analysts predict that the low-code industry will grow by 20+% each year. But what is low-code, why is it getting so popular and what are the issues with it?

Low-code is something that we’ve occasionally seen in the past decades – a drag-and-drop UI designer that allows you to create simple applications without coding skills. Products have matured and practically all offer similar features – the ability to design entity relationships in drag-and-drop entity-relationship diagrams, the ability to design UI via WYSIWYG, to design simple processes via BPMN-like notations, to call external services via importing web service definitions, to choose from pre-baked entity definitions and to fetch and store data in external databases and spreadsheets.

There are many tools in this domain – MS PowerApps, OutSystems, Appian, Mendix, Google’s recently acquired AppSheets, Ninox, WaveMaker, and many more. And they may differ slightly in their approach and in their feature set, but the whole point is to be able to easily create applications (web-based, mobile-based, hybrid) that solve some immediate pain that these users have, where going through a full-blown IT project with the associated development cost is an overkill.

And that sounds great – you don’t have to rely on the overly busy and not often too responsive IT department in your non-IT company, you just build something yourself to optimize your immediate problem and to digitize your paper processes. And it can do that. To be honest, I like the idea of such tools existing, because a large portion of digital transformation is not handled well by huge, centralized systems and years long development projects. A lot of agility is required in a landscape where there’s more and more demand for digital transformation with expensive developers with limited availability. We have to admit that professional developers can’t be everywhere and solve every problem that could be solved by information technology. And so such tools democratize digital transformation, allowing non-technical people to create software.

Or at least that’s the theory. In practice this is challenging in many ways:

  • Some technical knowledge required – it’s cool to be able to draw an entity relationship diagrams, but first you need to know what a data model is. It’s nice to be able to import a web service and be able to call it, but you have to know what a web service is. Integrating user directories implies you know what LDAP/AD is. I’m not sure non-technical people are able to make real use of these capabilities. Some tools still require simple code even to open a new dialog, and you have to copy-paste that code from somewhere.
  • Integration with on-premise systems – building something useful almost always requires integrating with an existing system. It’s fine to assume everything is in “the cloud”, but even the cloud can be seen as on-premise from the perspective of a 3rd party SaaS. Many tools that I’ve seen integrate with databases by supplying IP, username and password – something that is almost never the case and is a bad idea. The ability to connect to something within the orgnaization’s infrastructure (even if that infrastructure is on Azure and you are using MS PowerApps) means permissions, networks rules configuration, accounts, approvals. And you have to know what to request in order to get it.
  • Vendor lock-in – most low-code tools use proprietary formats for meta-descriptions (some go so far as to store binary representation of their metadata in a local sqlite database, for example). Some providers allow you to run the application on your own cloud by installing their server-side application, but some are purely SaaS. Once you build an application with one tool, you can’t really switch to another. Nice exceptions are WaveMaker and Skyve which generate actual Java code which you can download at any time. Is that lock-in bad? Well, yes – if it happens that you need a feature that’s not yet available, or an integration that’s not yet there, you are stuck.
  • Shadow IT – all IT in an organization is assumed to be managed and observed by the IT department. In terms of monitoring, security, compliance, data protection, lifecycle management, technical support, etc. With low-code these applications can exist without the IT department even knowing about them and that poses many risks (which I’ll discuss below)
  • Sustainability – what if a low-code company goes out of business, or gets acquired by someone who decides to sunset a product or a set of features? What happens when the employee that created the low-code app leaves and there’s nobody that knows how to “program” with the selected tool in order to support the app? What happens when the low-code app becomes legacy itself? Because of the vendor-lock in and lack of any standardization, it’s a big risk to take in terms of sustainability. Sure, you’ll solve an immediate problem somehow, but you may create many more down the line.
  • Security – on the one hand, using a PaaS/SaaS may be perceived as coming with built-in security. On the other hand, non-technical people can’t assess the security of a given platform. And security can’t be fully “built-in” – you have to make sure that authentication is required, that apps are not visible outside some whitelisted office locations, that data can’t be exported by unauthorized people, that you have protection against XSS, CSRF, SQLi and whatnot. Even if the platform gives these options, a non-technical person doesn’t know they have to take care of them in the first place.
  • Compliance – many industries are regulated and there are horizontal regulations like the data protection ones (GDPR, CCPA). Data breaches often happen because data was fetched from its original well-protected storage and kept in various places (like a low-code app) that the data protection officer doesn’t know about. Encryption, anonymization, data minimization, retention periods – most low-code solutions don’t support these things out of the box and it’s unlikely that an employee not familiar with the specifics will walk the extra mile to have the app compliant.
  • Bugs outside of your control – when you have a bug in your software, you can fix it. If it’s in a library, you can patch it. If it’s in a set of tools of a third-party platform, you can’t do anything. During my testing of several low-code solutions I stumbled upon many bugs and inconsistencies.

Some of these problems exist in regular projects as well. Developers may not know how to make an application GDPR-compliant, security is too often overlooked in many projects, and technologies are sometimes selected without taking sustainability into account. But these problems are aggravated by low-code solutions.

The interesting thing is that low-code is one part of a broader spectrum of technologies. They start with “Rapid application development” (RAD) tools and frameworks and end with “no-code” (a.k.a. less feature-rich no-code alternatives). Code-generation tools like Spring Roo are RAD, OpenXava is a RAD framework. Some low-code tools can actually be seen as RAD tools – the aforementioned WaveMaker can be used pretty easily by dev teams to quickly deliver simpler projects without sacrificing too much control (and I guess it is, being acquired by an software development company).

Then there’s RPA – “robotic process automation”, which I’ll bluntly simplify as “low-code with screen scraping” – you automate some processes that involve legacy systems with which you can only extract information and perform actions by having “bots” press buttons on screens. RPA lies slightly outside the rapid application development spectrum, but it brings one important point – there are RPA developers. People that are not deeply technical and aren’t as qualified as developers, but who can still automate processes using the RPA tools. Same goes for low-code and some flavors of RAD – it is assumed that you don’t have to be technical to develop something, but actually there can be (and there are) dedicated experts that can build stuff with these tools. They are not actual developers, but if you can deliver a project with 5 low-code developers and one actual developer, this can dramatically cut costs.

Developers and large companies certainly have a “toolbox” which they reuse across projects – snippets from here and there, occasionally some library or microservice. But each new project involves a lot of boilerplate that even developers need to get rid of. I’m certainly not a big fan of code generation, but RAD has some merit that we have to explore it in order to improve efficiency without sacrificing quality. And to be able to provide the simple tools that would otherwise be built with sub-optimal low-code approaches.

Blurring the line between “developer” and “non-developer” is already happening, though. And while focusing on our fancy frameworks, we shouldn’t lose sight on the tools for the less technically skilled; if only because there’s a risk we’ll have to support them in the long run. And because they will become part of a software ecosystem with which our software will have to interact.

Whether that’s the right and sustainable approach to digital transformation, I don’t know. I’d prefer, of course, everyone to be able to code at least simple things with the help of advanced RAD tools. And we’ll probably get there in a few decades. But as digital transformation needs to happen now, we may have to work with what we have and try to make it more secure, compliant, with less vendor lock-in and more visibility for the IT departments. Otherwise we risk creating an even bigger mess than we are at now.

The post Low-Code, Rapid Application Development and Digital Transformation appeared first on Bozho's tech blog.

Am I a Real Expert?

Post Syndicated from Bozho original https://techblog.bozho.net/am-i-a-real-expert/

The other day I had a conversation with a scientist friend who said something alone the lines of “yes, I work in that general field, but I’m not an expert in your question in particular”. IT is not science, of course, but I asked myself whether I am a real expert in the things that I do. And while it’s nearly impossible to hit exactly the fine line between impostor syndrome and boasting, this post is neither and has a point, so bear with me.

I’ve been doing a lot of things in the general IT field – from general purpose software engineering, IT architecture, information security, applications of cryptography, blockchain, e-government, algorithmic music composition, data analysis. And I’ve seen myself as having relatively expert knowledge. I even occasionally give TV and radio interviews, where I’m labelled as “Expert in X”. But…

  • Am I a real expert in software engineering and software architecture? I’ve been doing that for 15+ years, and I follow and somethings define or clarify best practices, I’m familiar with different methodologies and have been part of teams that implemented some of them correctly and efficiently. I have taken part in the decision making process of building large systems with their architectural implications. But I’ve never been formally assigned as an “architect” (not that I insist), my UML skills are rather basic and I’ve never had to integrate dozens of legacy systems. I’ve never used formal methods for assessing software, I’ve made mistakes in selecting technologies, I’ve never done proper TDD and I have only a basic understanding of networking. Maybe just the sheer amount of years of experience positions me as an expert, maybe the variety of projects I’ve worked on.
  • Am I a real expert in cryptography? Almost certainly not. Yes, I’m using cryptographic building blocks regularly, I know what an initialization vector is and I’ve code reviewed a merkle tree implementation. I’ve read dozens of papers on cryptography and understood many of them. But some papers are greek to me – I have no clue about the math behind cryptography. Sure, RSA is easy, but I have just a basic understanding of how elliptic curves work. On the other hand, I probably know more than 99% of the software industry, where the average person barely differentiates symmetric and asymmetric cryptography, IV is a roman numeral, and cryptography boils down to disabling TLS 1.0 on a web server.
  • Am I a real expert in information security? I’ve given talks on it, I’m in the infosec business, I know and follow best security practices, I know about sqlmap and I’ve even used Wireshark; I understand DEFCON talks and I’ve even decompiled several apps to find (and report) security vulnerabilities. But I’m no Mr. Robot-level hacker, nor I’m a CISO in a large organization who has to plan and implement security measures on hundreds of systems. I haven’t been part of red-teaming exercises and I haven’t built or operated a security operations center (SOC). But maybe in an industry where even having heard of OWASP puts you in the top 10% and actively thinking about the security aspects of each new piece of code puts you in the top 1%, I’m an expert.
  • Am I a real blockchain expert? I know Bitcoin’s and Ethereum’s implementations, I have implemented something similar to bitcoin’s data structures, I know what a Patricia merkle tree is and I’ve built and pushed raw Ethereum transactions. But I’m no Vitalik Buterin, I can’t build something like Ethereum, I’m only vaguely familiar with distributed consensus algorithms and their pitfalls, and I haven’t written a smart contract more complicated than a tutorial example. I haven’t run a production deployment of Hyperledger (only a test one), and I largely ignore most of the new networks. You may say that one doesn’t need to be Vitalik or Satoshi to be an expert, and with most people seeing blockchain as “that thing that stores data in an unmodifiable way”, one could be an expert by just writing a Hello world smart contract.
  • Am I a real e-government expert? Sure, I’ve been an e-government advisor to a deputy prime minister, I’ve co-authored legislation and strategic documents and understand how and why e-government works in several EU countries, most notably Estonia, but do I have a holistic view? I have almost no idea of how the e-government is structured in South Korea, Singapore or UAE, for example, I haven’t written a single paper, and I haven’t measured the impact of legislative, organizational and technical measures that we proposed and applied. There are questions that I don’t know the answer to – e.g. how to make the pan-European eID framework actually work.

So the question is – what does it mean to be an expert anyway? We will always be somewhere on an “expert spectrum”. And in many cases our industry doesn’t apply even basic good practices, so even basic expertise can be very valuable. There are always people that know more than you on a given sub-sub-field, and there are always people that are better than you at most of the things that you do. The reputation of “expert” is something important, yet something so vague. Individually it’s good to know where one stands (Dunning-Kruger and everything), and to be aware of the limits of one’s knowledge and understanding. Knowing the things that you don’t know is a good start.

But in a broader context, who’s an expert? Imagine that after we recover from the COVID-19 crisis, there’s a cyber crisis. Who will be the IT experts to advise governments on the measures to be taken? University professors? Senior silicon valley technical people? Who will be on TV to discuss the cyber crisis in the role of “expert” – a senior engineer at a big bank, a junior developer, or someone that took CS 101 in university and happens to know the host? Who will drive the agenda and public opinion?

The level of our expertise is primary for our careers, but it also has other aspects outside of our immediate field. When a crisis hits (and even if it doesn’t) it’s important that we have real experts, that we listen to them and that we trust them. But also to realize no expert knows everything about everything, and that many questions don’t have absolute answers, even for experts. That knowledge decays if not utilized and “published a paper 30 years ago” may be irrelevant today.

I promised that the article would have a point. And it’s two-fold. First, make sure you know what you don’t know, so that you can explore it if needed. Second, we need to value expertise with its imperfections. There is no absolute expert in anything, there are only relative experts. And forgive me for going through my skills (or lack thereof), but that was the best way I could think of for illustrating my point in depth.

Finally, I hope there isn’t a global IT-related crisis. But as some consider it inevitable, we may think about the perception of expertise in our field and who can we trust with certain aspects. There is no “full stack” expert, as the field is too broad.

The post Am I a Real Expert? appeared first on Bozho's tech blog.

An AWS Elasticsearch Post-Mortem

Post Syndicated from Bozho original https://techblog.bozho.net/aws-elasticsearch-post-mortem/

So it happened that we had a production issue on the SaaS version of LogSentinel – our Elasticsearch stopped indexing new data. There was no data loss, as elasticsearch is just a secondary storage, but it caused some issues for our customers (they could not see the real-time data on their dashboards). Below is a post-mortem analysis – what happened, why it happened, how we handled it and how we can prevent it.

Let me start with a background of how the system operates – we accept audit trail entries (logs) through a RESTful API (or syslog), and push them to a Kafka topic. Then the Kafka topic is consumed to store the data in the primary storage (Cassandra) and index it for better visualization and analysis in Elasticsearch. The managed AWS Elasticsearch service was chosen because it saves you all the overhead of cluster management, and as a startup we want to minimize our infrastructure management efforts. That’s a blessing and a curse, as we’ll see below.

We have alerting enabled on many elements, including the Elasticsearch storage space and the number of application errors in the log files. This allows us to respond quickly to issues. So the “high number of application errors” alarm triggered. Indexing was blocked due to FORBIDDEN/8/index write. We have a system call that enables it, so I tried to run it, but after less than a minute it was blocked again. This meant that our Kafka consumers failed to process the messages, which is fine, as we have a sufficient message retention period in Kafka, so no data can be lost.

I investigated the possible reasons for such a block. And there are two, according to Amazon – increased JVM memory pressure and low disk space. I checked the metrics and everything looked okay – JVM memory pressure was barely reaching 70% (and 75% is the threshold), and there was more than 200GiB free storage. There was only one WARN in the elasticsearch application logs (it was “node failure”, but after that there were no issues reported)

There was another strange aspect of the issue – there were twice as many nodes as configured. This usually happens during upgrades, as AWS is using blue/green deployment for Elasticsearch, but we haven’t done any upgrade recently. These additional nodes usually go away after a short period of time (after the redeployment/upgrade is ready), but they wouldn’t go away in this case.

Being unable to SSH into the actual machine, being unable to unblock the indexing through Elasticsearch means, and being unable to shut down or restart the nodes, I raised a ticket with support. And after a few ours and a few exchanged messages, the problem was clear and resolved.

The main reason for the issue is 2-fold. First, we had a configuration that didn’t reflect the cluster status – we had assumed a bit more nodes and our shared and replica configuration meant we have unassigned replicas (more on shards and replicas here and here). The best practice is to have nodes > number of replicas, so that each node gets one replica (plus the main shard). Having unassigned shard replicas is not bad per se, and there are legitimate cases for it. Our can probably be seen as misconfiguration, but not one with immediate negative effects. We chose those settings in part because it’s not possible to change some settings in AWS after a cluster is created. And opening and closing indexes is not supported.

The second issue is AWS Elasticsearch logic for calculating free storage in their circuit breaker that blocks indexing. So even though there were 200+ GiB free space on each of the existing nodes, AWS Elasticsearch thought we were out of space and blocked indexing. There was no way for us to see that, as we only see the available storage, not what AWS thinks is available. So, the calculation gets the total number of shards+replicas and multiplies it by the per-shard storage. Which means unassigned replicas that do not take actual space are calculated as if they take up space. That logic is counterintuitive (if not plain wrong), and there is hardly a way to predict it.

This logic appears to be triggered when blue/green deployment occurs – so in normal operation the actual remaining storage space is checked, but during upgrades, the shard-based check is triggered. That has blocked the entire cluster. But what triggered the blue/green deployment process?

We occasionally need access to Kibana, and because of our strict security rules it is not accessible to anyone by default. So we temporarily change the access policy to allow access from our office IP(s). This change is not expected to trigger a new deployment, and has never lead to that. AWS documentation, however, states:

In most cases, the following operations do not cause blue/green deployments: Changing access policy, Changing the automated snapshot hour, If your domain has dedicated master nodes, changing data instance count.
There are some exceptions. For example, if you haven’t reconfigured your domain since the launch of three Availability Zone support, Amazon ES might perform a one-time blue/green deployment to redistribute your dedicated master nodes across Availability Zones.

There are other exceptions, apparently, and one of them happened to us. That lead to the blue/green deployment, which in turn, because of our flawed configuration, triggered the index block based on the odd logic to assume unassigned replicas as taking up storage space.

How we fixed it – we recreated the index with fewer replicas and started a reindex (it takes data from the primary source and indexes it in batches). That reduced the size taken and AWS manually intervened to “unstuck” the blue/green deployment. Once the problem was known, the fix was easy (and we have to recreate the index anyway due to other index configuration changes). It’s appropriate to (once again) say how good AWS support is, in both fixing the issue and communicating it.

As I said in the beginning, this did not mean there’s data loss because we have Kafka keep the messages for a sufficient amount of time. However, once the index was writable, we expected the consumer to continue from the last successful message – we have specifically written transactional behaviour that committed the offsets only after successful storing in the primary storage and successful indexing. Unfortunately, the kafka client we are using had auto-commit turned on that we have overlooked. So the consumer has skipped past the failed messages. They are still in Kafka and we are processing them with a separate tool, but that showed us that our assumption was wrong and the fact that the code calls “commit” doesn’t actually mean something.

So, the morals of the story:

  • Monitor everything. Bad things happen, it’s good to learn about them quickly.
  • Check your production configuration and make sure it’s adequate to the current needs. Be it replicas, JVM sizes, disk space, number of retries, auto-scaling rules, etc.
  • Be careful with managed cloud services. They save a lot of effort but also take control away from you. And they may have issues for which your only choice is contacting support.
  • If providing managed services, make sure you show enough information about potential edge cases. An error console, an activity console, or something, that would allow the customer to know what is happening.
  • Validate your assumptions about default settings of your libraries. (Ideally, libraries should warn you if you are doing something not expected in the current state of configuration)
  • Make sure your application is fault-tolerant, i.e. that failure in one component doesn’t stop the world and doesn’t lead to data loss.

To sum it up, a rare event unexpectedly triggered a blue/green deployment, where a combination of flawed configuration and flawed free space calculation resulted in an unwritable cluster. Fortunately, no data is lost and at least I learned something.

The post An AWS Elasticsearch Post-Mortem appeared first on Bozho's tech blog.

Where Is This Coming From?

Post Syndicated from Bozho original https://techblog.bozho.net/where-is-this-coming-from/

In enterprise software the top one question you have to answer as a developer almost every day is “Where is this coming from?”. When trying to fix bugs, when developing new features, when refactoring. You have to be able to trace the code flow and to figure out where a certain value is coming from.

And the bigger the codebase is, the more complicated it is to figure out where something (some value or combination of values) is coming from. In theory it’s either from the user interface or from the database, but we all know it’s always more complicated. I learned that very early in my career when I had to navigate a huge telecom codebase in order to implement features and fix bugs that were in total a few dozens line of code.

Answering the question means navigating the code easily, debugging and tracing changes to the values passed around. And while that seems obvious, it isn’t so obvious in the grand scheme of things.

Frameworks, architectures, languages, coding styles and IDEs that obscure the answer to the question “where is this coming from?” make things much worse – for the individual developer and for the project in general. Let me give a few examples.

Scala, for which I have mixed feelings, gives you a lot of cool features. And some awful ones, like implicits. An implicit is something like a global variable, except there are nested implicit scopes. When you need some of those global variables, so just add the “implicit” keyword and you get the value from the inner-most scope available that matches the type of the parameter you want to set. And in larger projects it’s not trivial to chase where has that implicit value been set. It can take hours of debugging to figure out why something has a particular value, only to figure out some unrelated part of the code has touched the relevant implicits. That makes it really hard to trace where stuff is coming from and therefore is bad for enterprise codebases, at least for me.

Another Scala feature is partially applied functions. You have a function foo(a, b, c) (that’s not the correct syntax, of course). You have one parameter known at some point, and the other two parameters known at a later point. So you can call the function partially and pass the resulting partially applied function to the next function, and so on until you have the other arguments available. So you can do bar(foo(a)) which means that in bar(..) you can call foo(b, c). Of course, at that point, answering the question “where did the value of a come from” is harder to answer. The feature is really cool if used properly (I’ve used it, and was proud about it), but it should be limited to smaller parts of the codebase. If you start tossing partially applied functions all over the place, it becomes a mess. And unfortunately, I’ve seen that as well.

Enough about Scala, the microservices architecture (which I also have mixed feeling about) also complicates the ability of a developer to trace what’s happening. If for a given request you invoke 3-4 external systems, which both return data and manipulate data, it becomes much harder to debug your application. Instead of putting a breakpoint or doing a call hierarchy, you have to track the parameters of each interaction with each microservice. It’s news to nobody that microservices are harder to debug but I just wanted to put that in the context of answering the “where is this coming from” question.

Dynamic typing is another example. I’ve included that as part of my arguments why I prefer static typing. Java IDEs have “Call hierarchy”. Which is the single most useful IDE functionality for large enterprise software (for me even more important than the refactoring functionality). You really can trace every bit of possible code flow, not only in your codebase, but also in your dependencies, which often hide the important details (chances are, you’ll be putting breakpoints and inspecting 3rd party code rather often). Dynamic typing doesn’t give you the ability to do that properly. doSomething called on an unknown-at-compile-time type can be any method with that name. And tracing where stuff is coming from becomes much harder.

Code generation is something that I’ve always avoided. It takes input from text files (in whatever language they are) and generates code, turning the question “where is this coming from” to “why has this been generated that way”.

Message queues and async programming in general – message passing obscures the source and destination of a given piece of data; a message queue adds complexity to the communication between modules. With microservices you at least have API calls, with queues, you have multiple abstractions between the sender and recipient (exchanges, topics, queues). And that’s a general drawback of asynchrounous programming – that there’s something in between the program flow that does “async magic” and spits something on the other end – but is it transformed, is it delayed, is it lost and retried, is it still waiting?

By all these examples I’m not saying you should not use message queues, code generation, dynamic languages, microservices or Scala (though for some I’d really advice against). All of these things have their strengths, and they have been chosen exactly for those strengths. A message queue was probably chosen because you want to really decouple producer and consumer. Scala was chosen for its expressiveness. Microservices were chosen because a monolith had become really hard to manage with multiple teams and multiple languages.

But we should try to minimize the “damage” of not being able to easily trace the program flow and not being able to quickly answer “where is this coming from”. Impose a “no-implicits” rule on your scala code base. Use code-generation for simpler components (e.g. DTOs for protobuf). Use message queues with predictable message/queue/topic/exchange names and some slightly verbose debug logging. Make sure your microservices have corresponding SDKs with consistent naming and that they can be run locally without additional effort to ease debugging.

It is expected that the bigger and more complex a project is, the harder it will be to trace where stuff is going. But do try to make it as easy as possible, even if it costs a little extra effort in the design and coding phase. You’ll be designing and writing that feature for a week. And you (and others) will be supporting and expanding it for the next 10 years.

The post Where Is This Coming From? appeared first on Bozho's tech blog.

One-Month of Microsoft DKIM Failure and Thoughts on Technical Excellence

Post Syndicated from Bozho original https://techblog.bozho.net/one-month-of-microsoft-dkim-failure-and-thoughts-on-technical-excellence/

Last month I published an article about getting email settings right. I had recently configured DKIM on our company domain and was happy to share the experience. However, the next day I started receiving error DMARC reports saying that our Office365-backed emails are failing DKIM which can mean they are going in spam.

That was quite surprising as I have followed every step in the Microsoft guide. Let me first share a bit about that as well, as it will contribute to my final point.

In order to enable DKIM, on a normal email provider, you just have to get the DKIM selector value from some settings screen and copy them in your DNS admin dashboard (in a new ._domainkey. record). Microsoft, instead, make you do the following:

  • Connect to Exchange Powershell
  • Oh, you have 2FA enabled? Sorry, that doesn’t work – follow this tutorial instead
  • Oh, that fails when downloading their custom application? An obscure stackoverflow answer tells you that you should not download it with Firefox or Chrome – it fails to run then. You should download it with Internet Explorer (or Edge) and then it works. Just Microsoft.
  • Now that you are connected, follow the DKIM guide
  • In addition to powershell, you should also go in the Exchange admin in your Office365 admin, and enable DKIM

Yes, you can imagine you can waste some time here. But it finally works, and you set the CNAME records and assume that what they wrote – namely, that the TXT records to which the CNAME records point, are generated on Microsoft’s end (for onmicrosoft.com), and so anyone trying to validate a signature will pick the right public key from there and everything will be perfect. Well, no. Here’s an excerpt from a test mail I sent to my Gmail in December:

DKIM: ‘FAIL’ with domain logsentinel.com
ARC-Authentication-Results: i=2; mx.google.com;
dkim=temperror (no key for signature) [email protected] header.s=selector1

I immediately reported the issue in a way that I think is quite clear, although succint:

Hello, I configured DKIM following your tutorial (https://docs.microsoft.com/en-us/microsoft-365/security/office-365-security/use-dkim-to-validate-outbound-email), however the TXT records at your end that are supposed to be generated are not generated – so me setting a CNAME records points to a missing TXT record. I tried disabling and reenabling, and rotating the keys, but the TXT record is still missing, which means my emails get in spam folders, because the DMARC policy epxects a DKIM record. Can you please fix that and create the required TXT records?

What followed was 35 days of long phone calls and repetitive emails with me trying to explain what the issue is and Microsoft support not getting it. As a sidenote, Office365 support center doesn’t support tickets with so much communication – it doesn’t have pagination so I can’t see the original note online (only in my inbox).

On the 5th minute of my first-of-several 20 minute calls my wife, who was around, already knew exactly what the issue is (she’s quite technical, yes), but somehow Microsoft support didn’t get it. We went through multiple requests for me providing DMARC reports (and just the gmail report wasn’t enough, it had to be from 2 others as well), screenshots of my DNS administration screen, screenshots to prove the record is missing (but mxtoolbox is some third party tool they can’t trust), even me sharing my computer for a remote desktop session (no way, company policy doesn’t allow random semi-competent people to touch my machine). I ended up providing an nslookup (a Microsoft tool) screenshot, to show that the record is missing.

In the meantime I tried key rotation again form the exchange admin UI. That got stuck for a week, so I raised a separate ticket. While we were communication on that ticket as well, rotation finished, but the keys didn’t change. What changed was (that I later figured) the active selector. We’ll get to that later.

After the initial few days I executed Get-DkimSigningConfig | fl to get the DKIM record values and just created TXT records with them (instead of CNAME), so that our mails don’t get in spam. Of course, having DKIM fail doesn’t automatically mean you’ll get in spam, even though our DMARC policy says so, but it’s a risk one shouldn’t take. With the TXT records set everything was working okay, so I didn’t have an urgent problem anymore. That allowed me to continue slowly trying to resolve the issue with Microsoft support, for another month.

Two weeks ago they acknowledged something is missing on their end. Miracle! So the last few days of the horror were about me having to set the CNAME records back (and get rid of my TXT records) so that they can fix the missing records on their end. I know, that sounds ridiculous – you don’t need a CNAME to point to your TXT in order to get that TXT to appear, but who knows what obscure and shitty programs and procedures they have, so I complied. What followed was a request for more screenshots and more DKIM reports.

No. You have nslookup. No screenshot of mine is better than aн nslookup. The TTL for my entries is 300s, so there should be no issue with caching (and screenshots won’t help with that either).

Then finally, after having spoken to or emailed probably 5-6 different people, there was an email that made sense:

Good Afternoon,

As per the public documentation you followed (https://docs.microsoft.com/en-us/microsoft-365/security/office-365-security/use-dkim-to-validate-outbound-email) it is mentioned to have TXT records created in Microsoft DNS server for both Selector CNAME Records. we agree with that.

Our product engineer team has confirmed that there is a design level change was rolled out and it is still not published in public documentation. To be precise, we don’t require TXT record for both CNAME Records published. The TXT record will be published only for the Active Selector.

LastChecked 2020-01-01 07:40:58Z
KeyCreationTime 2019-12-28 18:37:38Z
RotateOnDate 2020-01-01 18:37:38Z
SelectorBeforeRotateOnDate selector1
SelectorAfterRotateOnDate selector2

Above info confirms that the active selector is “Selector2” and we have the TXT record published. So we don’t have to wait for the TXT record for Selector1 in our scenario.

Also we require the Selector1 CNAME record in your DNS, whenever next rotate happen. Microsoft will publish the other TXT record. This shouldn’t affect your environment at any chance.

Alright. So, when I triggered key rotation, I fixed the results of the original bug that lead to no records present. The rotation made the active selector “selector2” and generated its TXT record. Selector1 is currently missing, but it’s inactive, so that is not an issue. What “active selector” means is an interesting question. It’s nowhere in the documentation, but there’s this article from 2016 that sheds some light. Microsoft has to swap the selectors on each rotation. According to the article rotation happens every week, but that’s no longer the case – since my manual rotation there hasn’t been any automatic one.

So to summarize – there was an initial bug that happened who knows why, then my stuck-for-a-week manually triggered rotation fixed it, but because they haven’t documented the actual process, there was no way for me to know that the missing TXT record is fine and will (hopefully) be generated on the next rotation (which is my current pending question for them – when will that happen so that I check if it worked).

To highlight the issues here. First, a bug. Second, missing/outdated documentation. Third, incompetent (though friendly) support. Fourth, a convoluted procedure to activate a basic feature. And that’s for not getting emails in spam in an email product.

I can expand those issues to any Microsoft cloud offering. We’ve been doing some testing on Azure and it’s a mess. At first an account didn’t even work. No reason, just fails. I had to use another account. The UI is buggy. Provisioning resources takes ages with no estimate. Documentation is far from perfect. Tools are meh.

And yet, Microsoft is growing its cloud business. Allegedly, they are the top cloud provider in the world. they aren’t, obviously, but through bundling Office365 and Azure in the same reporting, they appear bigger than AWS. Azure is most likely smaller than AWS. But still, Satya Nadella has brought Microsoft to market success again. Because technical excellence doesn’t matter in enterprise sales. You have your sales reps, nice conferences, probably some kickbacks, (alleged) easy integration with legacy Windows stuff that matters for most enterprise customers. And for the decision maker that chooses Microsoft over Amazon or Google the argument of occasional bugs, poor documentation or bad support is irrelevant.

But it is relevant for the end results. For the engineers and the products they build. Unfortunately, tech companies have prioritized alleged business value over technical excellence so much that they can no longer deliver the business value. If the outgoing email of an entire organization goes in spam because Microsoft has a bug in Office365 DKIM, where’s the business value of having a managed email provider? If the people-hours saved form going from on-prem to Azure are wasted because of poor technology, where’s the business value? Probably decision makers don’t factor that in, but they should.

I feel technical excellence has been in decline even in big tech companies, let alone smaller ones. And while that has lead to projects gone out of budged, poor quality and poor security, the fact that the systems aren’t that critical has made it possible for the tech sector to get away with that. And even grow.

But let me tell you of big player in a different industry that stopped caring about technical excellence. Boeing. This article tells the story of shifting Boeing from an engineering organization to one that cuts on quality expenses and outsources core capabilities. That lead to the death of people. Their plane crashed because of that shift in priorities. That lead to a steep decline in sales, and a moderate decline in stock and a bailout.

If you de-prioritize technical excellence, and you are in a critical sector, you lose. If you are an IT vendor, you can get away with it, for a while. Maybe forever? I hope not, otherwise everything in tech will be broken for a long time. And digital transformation will be a painful process around bugs and poor tools.

The post One-Month of Microsoft DKIM Failure and Thoughts on Technical Excellence appeared first on Bozho's tech blog.

One-Time Passwords Do Not Provide Non-Repudiation

Post Syndicated from Bozho original https://techblog.bozho.net/one-time-passwords-do-not-provide-non-repudiation/

The title is obvious and it could’ve been a tweet rather than a blogpost. But let me expand.

OTP, or one-time password, used to be mainstream with online banking. You get a dedicated device that generates a 6-digit code which you enter into your online banking in order to login or confirm a transaction. The device (token) is airgapped (with no internet connection), and has a preinstalled secret key that cannot be leaked. This key is symmetric and is used to (depending on the algorithm used) encrypt the current time, strip most of the ciphertext and turn the rest into 6 digits. The server owns the same secret key, does the same operation and compares the resulting 6 digits with the ones provided. If you want a more precise description of (T)OTP – check wikipedia.

Non-repudiation is a property of, in this case, a cryptographic scheme, that means the author of a message cannot deny the authorship. In the banking scenario, this means you cannot dispute that it was indeed you who entered those 6 digits (because, allegedly, only you possess the secret key because only you possess the token).

Hardware tokens are going out of fashion and are being replaced by mobile apps that are much easier to use and still represent a hardware device. There is one major difference, though – the phone is connected to the internet, which introduces additional security risks. But if we try to ignore them, then it’s fine to have the secret key on a smartphone, especially if it supports secure per-app storage, not accessible by 3rd party apps, or even hardware security module with encryption support, which the latest ones do.

Software “tokens” (mobile apps) often rely on OTPs as well, even though they don’t display the code to the user. This makes little sense, as OTPs are short precisely because people need to enter them. If you send an OTP via a RESTful API, why not just send the whole ciphertext? Or better – do a digital signature on a server-provided challenge.

But that’s not the biggest problem with OTPs. They don’t give the bank (or whoever decides to use them as a means to confirm transactions) non-repudiation. Because OTPs rely on a shared secret. This means the bank, and any (curious) admin knows the shared secret. Especially if the shared secret is dynamically generated based on some transaction-related data, as I’ve seen in some cases. A bank data breach or a malicious insider can easily generate the same 6 digits as you, with your secure token (dedicated hardware or mobile).

Of course, there are exceptions. Apparently there’s ITU-T X.1156 – a non-repudiation framework for OTPs. It involves trusted 3rd parties, though, which is unlikely a bank scenario. There are also HSMs (server-side hardware security modules) that can store secret keys without any option to leak them to the outside world that support TOTP natively. So the device generates the next 6 digits. Someone with access to the HSM can still generate such a sequence, but hopefully audit trail would indicate that this happened, and it is far less of an issue than just a database breach. Obviously (or, hopefully?) secrets are not stored in plaintext even outside of HSMs, but chances are they are wrapped with a master key in an HSM. Which means that someone can bulk-decrypt them and leak them.

And this “can” is very important. No matter how unlikely it is, due to internal security policies and whatnot, there is no cryptographic non-repudation for OTPs. A customer can easily deny entering the code to confirm a transaction. Whether this will hold in court is a complicated legal issue, where the processes in the bank can be audited to see if a breach is possible, whether any of the admins has a motive to abuse their privilege and so on, whether there’s logs of the user activities that can be linked to their device based on network logs from internet carriers, and so on. So you can achieve some limited level of non-repudiation through proper procedures. But it is not much different than authorizing the transaction with your password (at least it is not a shared secret, hopefully).

But why bother with all of that if you have a mobile app with internet connection? You can just create a digital signature on a unique per-transaction challenge and you get proper non-repudiation. The phone can generate a private key on enrollment and send its public key to the server, which can later be used to verify the signature. You don’t have the bandwidth limitation of the human brain that lead to the requirement of 6 digits, so the signature length can be arbitrarily long (even GPRS can handle digital signatures).

Using the right tool for the job doesn’t have just technical implications, it also has legal and business implications. If you can achieve non-repudiation with standard public-key cryptography, if your customers have phones with secure hardware modules and you are doing online mobile-app-based authentication and authorization anyway, just drop the OTP.
.

The post One-Time Passwords Do Not Provide Non-Repudiation appeared first on Bozho's tech blog.

Solving Problems Properly Is Often Not Viable

Post Syndicated from Bozho original https://techblog.bozho.net/solving-problems-properly-is-often-not-viable/

How many times you, as a software expert, saw some software or process and thought “damn, this can be done so much better”. Yes, a lot. But why, since large organizations spend a lot of money on IT? Is it because software is too complex, is it because of organizational issues, is it legacy software, or just the way things are?

And if what we see is so bad, isn’t common market sense saying that surely these companies will be disrupted and made obsolete by a new and better competitor? I won’t give a definitive answer why software is bad (I have already blamed developers, though I admit, developers are only a part of the problem). But I’ll try to reason about what it’s not getting replaced under market forces.

Peter Thiel in his book “From Zero to One” says that a a new product has to be at least 10 times better in order to be disruptive. Everything below that is just gradual improvement over the status quo. Google was 10 times better than AltaVista, Lycos and Yahoo as a way to find information on the web. Skype was 10 times better than whatever was out there before that. But most of the things that we see as broken don’t get fixed because there’s no 10x improvement that would kill them. Let’s take a loot at a few examples.

Banking. Banking is horrible. Online banking UX is usually on par with a legacy ERP, bank transfers rely on a patchwork of national specifics and ages old international standards like SWIFT. I can’t really think of anyone who really likes their online banking. Mobile banking is even worse, as, of course, it’s tedious to copy-paste IBANs on a phone. Banks are, in many cases, still running COBOL on mainframes, and it’s really hard and expensive to get something changed or fixed. Why they are still around? Because the experience of the online banking is of marginal importance to the bank’s bottom line.

Yes, Revolut and TransferWise are cool tech billion-dollar-valuation startups that are “disrupting” banking. But not really. They are just fancier banks. Yes, finally you can do something on your mobile and you can be onboarded without going to a physical office and support is better and the UI is sleek, and the UX make sense. Do banks care? Not much. Because these startups are just small banks that are currently losing money in order to get to more people and make them spend small amounts online. Nothing of that is a 10x improvement in the value provided. It surely is much better technically, but once you have to transfer money elsewhere, you hit SWIFT or ideally SEPA. Once you want to pay in store, you hit Visa or MasterCard. Nothing of the legacy infrastructure is replaced, we just have better UX for, in many cases, sending money to those legacy banks.

Digital Identity. Anything online where it’s important to know who’s on the other end, requires digital identity. And yet we are nowhere near solving that problem. There are country-specific solutions where governments and/or consortia of banks issue some form of digital identity which others can trust. But it’s fragmented, with varying levels of security and integration isn’t necessarily easy. At the same time we have KYC companies that basically let you scan your customers passport or ID, optionally does a liveness detection (automated or manual) video conference and then check the person against databases of terrorists or sanctioned individuals. So for each service you have to do that again, slightly differently, and if something changes (e.g. an address or name change), there’s not really much you can do.

Ideally, digital identity is federated, using a common standard and easy to integrate, so that enrollment in sensitive online services that require some knowledge about a customer, is straightforward. The identification process gives just as much information as you need. Identity can be managed by multiple entities, government or private, that can vouch that an individual is indeed who they claim to be. The way you identify, with the current technology setting, would be with a key, securely stored in a mobile phone secure storage, using WebAuthn, SAML. There should be a way to re-issue they key in case a phone is lost, and then we’ll have an awesome, reusable digital identity to solve all online authentication and enrollment woes.

Alas, we are nowhere near that. Because what we currently have is working. It’s working terribly, expensively and with a lot of hacks and patches, and most importantly – users have to enrollment for every service separately, but from the point of view of each individual company, it’s easier to just get some service for passport verification and then issue username and password, with optional 2FA, and you are done. The system I described above is not 10 times better than the status quo. A blockchain-based self-sovereign identity, by the way, is also not 10 times better. And it has usability issues that even make it inferior to the status quo.

Privacy. Literally every day there’s a huge data breach or a huge privacy issue (mostly with Facebook). Facebook has been giving our data to 3rd parties, because why not. Companies are collecting whatever they can get ahold of, without much care of protecting it. Sensitive personal data is still stored unencrypted, unpseudonymized, in SQL databases with barely tracked admin access; applications still export bulks of sensitive data to excel sheets sent over email or published to public S3 buckets. Passwords are still stored in plaintext or unsalted, data is communicated over unencrypted connections.

We know how to fix all that; there are many best practices to address all of that. How many companies encrypt data at rest, which is the simplest and most obvious thing to do? How many encrypt data with separate keys to protect bulk exports? How many use pseudonymization when doing exports? Knowledge and best practices are out there but our software is still built as a CRUD application over a SQL datastore and the most important task is to get it done quickly, so let’s not bother with privacy protections.

And, from a business point of view, rightly so, unfortunately. Check the share price of a few recently breached companies. It drops for a few days and then bounces back up. And at the same time no privacy-preserving technology is 10 times better than what we have today, no company will be 10 times better if it protects personal data. So why bother?

Security. In the long run cybersecurity is very important. In the short term, it isn’t. Nothing bad will happen. The other day I inspected the 2FA application of a bank. It’s full of mistakes and yet, through multiple hoops and security-through-obscurity, it reduces the risk of something very bad happening. And if it happens, well, insurance will cover the losses. Data breaches happen not only because we don’t care about personal data, but also because we don’t care about security. Chief infomration security officers find it hard to convince boards that their role is needed, let alone that it needs budget. Security tools are a patchwork of barely working integrations. There was recently a series of rants from one infosec professional that rightly points out that our infosec tools are bad.

Are things improving? A little bit, mostly by introducing more secure protocols. Backward compatibility slows down the improvements, of course, but we’re getting there. But is there a security “fix” that makes things 10 times better, that changes the game, that cuts costs or risk so much? Nope.

These observations go for many more areas, both horizontal and vertical – social networks (where Facebook can’t even get sharing right, let alone data protection), public sector software is mostly still in the 90s, even online payments have not moved since PayPal at the beginning of the century – we still take our credit card from our wallets and type numbers and CVC codes, with all the associated fraud.

In many cases when there’s no visible improvement for years, some proactive governmental structure like the European Commission decides to step in and write some piece of legislation that tries to fix things. For banks, for example, there’s the PSD2 (2nd payment services directive), which mandates open banking – all data about a bank account should be accessible via APIs to third parties who can manage it, display it properly, analyze it. Wonderful in theory, a mess in practice so far – APIs barely work, there’s no standardization and anyone who wants to offer a service ontop of open banking APIs is suffering at the moment. SEPA, the single euro payment area standard, was introduced by a directive as well, more than a decade ago.

Regarding digital identity, there’s the eIDAS regulation which defines how governments should offer their eID cross border, so that, in theory, anyone can use any government or otherwise issued electronic identity to identify for any service across the EU. Things are not there yet at all, and the architecture is so complicated, I don’t think it will work as intended. The fragmented eID space will remain fragmented for all practical purposes.

For privacy there’s, of course, GDPR. And while it resulted in more people trying to take care of personal data, it lead to more documents being written than lines of code changed. Same for information security – the NIS directive tried to improve the security of providers of substantial services. It’s often unclear, though, what is a substantial service to begin with. And then it’s mostly organizational measures. Countries like the UK and Germany have good guidelines and frameworks to improve security but we are yet to see the results.

Why is solving problems properly so hard? Legacy software, legacy standard and legacy thinking certainly doesn’t help. But in many of these domains, solving problems properly does not bring sufficient business value. And we are not prepared to do things properly – while the knowledge on how to do things better exists, it’s not mainstream. And it’s not mainstream, because it’s not perceived as required.

We have half-assed our technology because it kinda works sufficiently to support the bottom line and to make users happy. We have muddled through the myriad of issues with the minimum effort required. Because even the maximum effort would not have been a sufficient improvement to change the game. Even the perfect re-imagined banking, digital identity, social network would not be significantly different for the end user. Sure, a little cheaper and a little more convenient, but not groundbreaking. Not to mention hidden horizontal aspects like security and privacy.

And we will continue that way, pushed by standards and regulations when nothing else helps, to a messy gradual improvement. But it won’t be worth the investment to do things right – why would you invest millions in quality software development when it will marginally affect your business? It’s only worth to get things to barely work.

The post Solving Problems Properly Is Often Not Viable appeared first on Bozho's tech blog.

A Technical Guide to CCPA

Post Syndicated from Bozho original https://techblog.bozho.net/a-technical-guide-to-ccpa/

CCPA, or the California Consumer Privacy Act, is the upcoming “small GDPR” that is applied for all companies that have users from California (i.e. it has extraterritorial application). It is not as massive as GDPR, but you may want to follow its general recommendations.

A few years ago I wrote a technical GDPR guide. Now I’d like to do the same with CCPA. GDPR is much more prescriptive on the fact that you should protect users’ data, whereas CCPA seems to be mainly concerned with the rights of the users – to be informed, to opt out of having their data sold, and to be forgotten. That focus is mainly because other laws in California and the US have provisions about protecting confidentiality of data and data breaches; in that regard GDPR is a more holistic piece of legislation, whereas CCPA covers mostly the aspect of users’ rights (or “consumers”, which is the term used in CCPA). I’ll use “user” as it’s the term more often use in technical discussions.

I’ll list below some important points from CCPA – this is not an exhaustive list of requirements to a software system, but aims to highlight some important bits. And, obviously, I’m not a lawyer, but I’ve been doing data protection consultations and products (like SentinelDB) for the past several years, so I’m qualified to talk about the technical side of privacy regulations.

  • Right of access – you should be able to export (in a human-readable format, and preferable in machine-readable as well) all the data that you have collected about an individual. Their account details, their orders, their preferences, their posts and comments, etc.
  • Deletion – you should delete any data you hold about the user. Exceptions apply, of course, including data used for prevention of fraud, other legal reasons, needed for debugging, necessary to complete the business requirement, or anything that the user can reasonably expect. From a technical perspective, this means you most likely have to delete what’s in your database, but other places where you have personal data, like logs or analytics, can be skipped (provided you don’t use it to reconstruct user profiles, of course)
  • Notify 3rd party providers that received data from you – when data deletion is requested, you have to somehow send notifications to wherever you’ve sent personal data. This can be a SaaS like Mailchimp, Salesforce or Hubspot, or it can be someone you sold the data (apparently that’s a major thing in CCPA). So ideally you should know where data has been sent and invoke APIs for forgetting it. Fortunately, most of these companies are already compliant with GDPR anyway, so they have these endpoints exposed. You just have to add the logic. If your company sells data by posting dumps to S3 or sending Excel sheets via email, you have a bigger problem as you have to keep track of those activities and send unstructured requests (e.g. emails).
  • Data lineage – this is not spelled out as a requirement, but it follows from multiple articles, including the one for deletion as well as the one for disclosing who data was sent to and where did data came from in your system (in order to know if you can re-sell it, among other things). In order to avoid buying expensive data lineage solutions, you can either have a spreadsheet (in case of simpler processes), or come up with a meaningful way to tag your data. For example, using a separate table with columns (ID, table, sourceType, sourceId, sourceDetails), where ID and table identify a record of personal data in your database, sourceType is the way you have ingested the data (e.g. API call, S3, email) and the ID is the identifier that you can use to track how it came in your system – API key, S3 bucket name, email “from”, or even company registration ID (data might still be sent around flash drives, I guess). Similar table for the outgoing data (with targetType and targetId). It’s a simplified implementation but it might work in cases where a spreadsheet would be too cumbersome to take care of.
  • Age restriction – if you’ve had the opportunity to know the age of a person whose data you have, you should check it. That means not to ignore the age or data of birth field when you import data from 3rd parties, and also to politely ask users about their age. You can’t sell that data, so you need to know which records are automatically opted out. If you never ever sell data, well, it’s still a good idea to keep it (per GDPR)
  • Don’t discriminate if users have used their privacy rights – that’s more of a business requirement, but as technical people we should know that we are not allowed to have logic based on users having used their CCPA (or GDPR) rights. From a data organization perspective, I’d put rights requests in a separate database than the actual data to make it harder to fulfill such requirements. You can’t just do a SQL query to check if someone should get a better price, you should do cross system integration and that might dissuade product owners from breaking the law; furthermore it will be a good sign in case of audits.
  • “Do Not Sell My Personal Information” – this should be on the homepage if you have to comply with CCPA. It’s a bit of a harsh requirement, but it should take users to a form where they can opt out of having their data sold. As mentioned in a previous point, this could be a different system to hold users’ CCPA preferences. It might be easier to just have a set of columns in the users’ table, of course.
  • Identifying users is an important aspect. CCPA speaks about “verifiable requests”. So if someone drops you an email “I want my data deleted”, you should be able to confirm it’s really them. In an online system that can be a button in the user profile (for opting out, for deletion, or for data access) – if they know the password, it’s fairly certain it’s them. However, in some cases, users don’t have accounts in the system. In that case there should be other ways to identify them. SSN sounds like one, and although it’s a terrible things to use for authentication, with the lack of universal digital identity, especially in the US, it’s hard not to use it at least as part of the identifying information. But it can’t be the only thing – it’s not a password, it’s an identifier. So users sharing their SSN (if you have it), their phone or address, passport or driving license might be some data points to collect for identifying them. Note that once you collect that data, you can’t use it for other purposes, even if you are tempted to. CCPA requires also a toll-free phone support, which is hardly applicable to non-US companies even though they have customers in California, but it poses the question of identifying people online based on real-world data rather than account credentials. And please don’t ask users about their passwords over the phone; just initiate a request on their behalf in the system and direct them to login and confirm it. There should be additional guidelines for identifying users as per 1798.185(a)(7).
  • Deidentification and aggregate consumer information – aggregated information, e.g. statistics, is not personal data, unless you are able to extract personal data based on it (e.g. the statistics is split per town and age and you have only two users in a given town, you can easily see who is who). Aggregated data is differentiate from deidentified data, which is data that has its identifiers removed. Simply removing identifiers, though, might again not be sufficient to deidentify data – based on several other data points, like IP address (+ logs), physical address (+ snail mail history), phone (+ phone book), one can be uniquely identified. If you can’t reasonably identify a person based on a set of data, it can be considered deidentified. Do make the mental exercise of thinking how to deidentify your data, as then it’s much easier to share it (or sell it) to third parties. Probably nobody minds being part of an aggregated statistics sold to someone, or an anonymized account used for trend analysis.
  • Pseudonymization is a measure to be taken in many scenarios to protect data. CCPA mentions it particularly in research context, but I’d support a generic pseudonymization functionality. That means replacing the identifying information with a pseudonym, that’s not reversible unless a secret piece of data is used. Think of it (and you can do that quite literally) as encrypting the identifier(s) with a secret key to form the pseudonym. You can then give that data to third parties to work with it (e.g. to do market segmentation) and then give it back to you. You can then decrypt the pseudonyms and fill the obtained market segment(s) into your own database. The 3rd party doesn’t get personal information, but you still get the relevant data
  • Audit trail is not explicitly stated as a requirement, but since you have the obligation to handle users requests and track the use of their data in and outside of your system, it’s a good idea to have a form of audit trail – who did what with which data; who handled a particular user request; how was the user identified in order to perform the request, etc.

As CCPA is not concerned with data confidentiality requirements, I won’t repeat my GDPR advice about using encryption whenever possible (notably, for backups), or about internal security measures for authentication.

CCPA is focused on the rights of your users and you should be able to handle them (and track how you handled them). You can have manual and spreadsheet based processes if you are not too big, and you should definitely check with your legal team if and to what extent CCPA applies to your company. But if you have implemented the GDPR data subject rights, it’s likely that you are already compliant with CCPA in terms of the overall system architecture, except for a few minor details.

The post A Technical Guide to CCPA appeared first on Bozho's tech blog.

Blockchain Overview – Types, Use-Cases, Security and Usability [slides]

Post Syndicated from Bozho original https://techblog.bozho.net/blockchain-overview-types-use-cases-security-and-usability-slides/

This week I have a talk on a meetup about blockchain beyond the hype – its actual implementation issues and proper use-cases.

The slides can be found here:

The main takeaways are:

  • Think of blockchain in specifics, not in high-level “magic”
  • Tamper-evident data structures are cool, you should be familiar with them – merkle trees, hash chains, etc. They are useful for other things as well, e.g. certificate transparency
  • Blockchain and its cryptography is perfect for protecting data integrity, which is part of the CIA triad of information security
  • Many proposed use-cases can be solved with centralized solutions + trusted timestamps instead
  • Usability is a major issue when it comes to wider adoption

As with anything in technology – use the right tool for the job, as no solution solves every problem.

The post Blockchain Overview – Types, Use-Cases, Security and Usability [slides] appeared first on Bozho's tech blog.

The Personal Data Store Pattern

Post Syndicated from Bozho original https://techblog.bozho.net/the-personal-data-store-pattern/

With the recent trend towards data protection and privacy, as well as the requirements of data protection regulations like GDPR and CCPA, some organizations are trying to reorganize their personal data so that it has a higher level of protection.

One path that I’ve seen organizations take is to apply the (what I call) “Personal data store” pattern. That is, to extract all personal data from existing systems and store it in a single place, where it’s accessible via APIs (or in some cases directly through the database). The personal data store is well guarded, audited, has proper audit trail and anomaly detection, and offers privacy-preserving features.

It makes sense to focus one’s data protection efforts predominantly in one place rather than scatter it across dozens of systems. Of course it’s far from trivial to migrate so much data from legacy systems to a new module and then upgrade them to still be able to request and use it when needed. That’s why in some cases the pattern is applied only to sensitive data – medical, biometric, credit cards, etc.

For the sake of completeness, there’s something else called “personal data stores” and it means an architecture where the users themselves store their own data in order to be in control. While this is nice in theory, in practice very few users have the capacity to do so, and while I admire the Solid project, for example, I don’t think it is viable pattern for many organizations, as in many cases users don’t directly interact with the company, but the company still processes large amounts of their personal data.

So, the personal data store pattern is an architectural approach to personal data protection. It can be implemented as a “personal data microservice”, with CRUD operations on predefined data entities, an external service can be used (e.g. SentinelDB, a project of mine), or it can just be a centralized database that has some proxy in front of it to control the access patterns. You an imagine it as externalizing your application’s “users” table and its related tables.

It sounds a little bit like a data warehouse for personal data, but the major difference is that it’s used for operational data, rather than (just) analysis and reporting. All (or most) of your other applications/microservices interact constantly with the personal data store whenever they need to access or update (or “forget”) personal data.

Some of the main features of such a personal data store, the combination of which protect against data breaches, in my view, include:

  • Easy to use interface (e.g. RESTful web services or simply SQL) – systems that integrate with the personal data store should be built in a way that a simple DAO layer implementation gets swapped and then data that was previously accessed form a local database is now obtained from the personal data store. This is not always easy, as ORM technologies add a layer of complexity.
  • High level of general security – servers protected with 2FA, access control, segregated networks, restricted physical access, firewalls, intrusion prevention systems, etc. The good things is that it’s easier to apply all the best practices applied to a single system instead of applying it (and keeping it that way) to every system.
  • Encryption – but not just “data at rest” encryption; especially sensitive data can and should be encrypted with well protected and rotated keys. That way the “honest but curious” admin won’t be able to extract anything form the underlying database
  • Audit trail – all infosec and data protection standards and regulations focus on accountability and traceability. There should not be a way to extract or modify personal data without leaving a trace (and ideally, that trace should be protected as well)
  • Anomaly detection – checking if there is something strange/anomalous in the data access patterns. Such strange access patterns can mean a data breach is happening, and the personal data store can actively block it. There is a lot of software out there that does anomaly detection on network traffic, but it’s much better if the rules (or machine learning) are domain-specific. “Monitor for increased traffic to those servers” is one thing, but it’s much better to be able to say “monitor for out-of-the ordinary accesses to personal data of such and such kind”
  • Pseudonymization – many systems that need the personal data don’t actually need to know who it is about. That includes marketing, including outsourcing to 3rd parties, reporting functionalities, etc. So the personal data store can return data that does not allow a person do be identified, but a pseudo-ID instead. That way, when updates are made back to the personal data store, they can still refer to a particular person, via the pseudonymous ID, but the application that extracted the data in the first place doesn’t get to know who the data was about. This is useful in scenarios where data has to be (temporarily or not) stored in a database that lies outside the personal datastore.
  • Authentication – if the company offers user authentication, this can be done via the personal data store. Passwords, two-factor authentication secrets and other means of authentication are personal data, and an important one as well. An organization may use a single-sign-on internally (e.g. Active Directory), but it doesn’t make sense to put customers there, too, so they are usually stored in a database. During authentication, the personal data store accepts all necessary credentials (username, password, 2FA code), and return a token to be used for subsequent calls or to be used a a session cookie token.
  • GDPR (or CCPA or similar) functionalities – e.g. export of all data about a person, forgetting a person. That’s an often overlooked problem, but “give me all data about me that you have” is an enormous issue with large companies that have dozens of systems. It’s next to impossible to extract the data in a sensible way from all the systems. Tracking all these requests is itself a requirement, so the personal data store can keep track of them to present to auditors if needed.

That’s all easier said than done. In organizations that have already many systems working alongside and processing personal data, migration can be costly. So it’s a good idea to introduce it as early as possible, and have a plan (even if it lasts for years) to move at least sensitive personal data to the well protected silo. This silo is a data engineering effort, a system refactoring effort and an organizational effort. The benefits, though, are reduced long-term cost and reduced risks for data breaches and non-compliance.

The post The Personal Data Store Pattern appeared first on Bozho's tech blog.

Cybersecurity Is Very Important

Post Syndicated from Bozho original https://techblog.bozho.net/cybersecurity-is-very-important/

A few months ago an essay titled “Cybersecurity is not very important” appeared. The essay is well written and interesting but I’d like to argue against its main point.

And that is actually hard – the essay has many good points, and although it has a contrarian feel, it actually isn’t saying anything outrageous. But I still don’t agree with the conclusion. I suggest reading it (or skimming it) first before continuing here, although this article is generally self-sufficient.

I agree with many things in the essay, most importantly that there is no 100% protection and it’s all about minimizing the risk. I also agree that cybersecurity is a complex set of measures that span not only the digital world, but he physical one as well. And I agree that even though after watching a few videos from DEF CON, BlackHat or CCC, one feels that everything is fundamentally broken and going to live in the mountains is the only sane strategy to survive an impending digital apocalypse, this is not the case – we have a somewhat okayish level of protection for the more important parts of the digital world. Certainly exploitable, but not trivially so.

There are, though, a few main claims that I’d like to address:

  • There has not been any catastrophic cybersecurity event – the author claims that the fact that there was no digital Pearl Harbor or 9/11 suggests that we’ve been investing just the right amount of effort in cybersecurity. I don’t think that’s a fair comparison. Catastrophic events like that cost human lives as an immediate result of a physical action. No digital event can cause immediate loss of human life. However, it can cause indirect loss of human life, and it probably has already – take a famous data breach in an extramarital affair dating site – do we know how much people were killed in Pakistan or Saudi Arabia because infidelity (or homosexuality) was exposed? How many people died because hospitals were victims of ransomware? How many people died when the Ukranian power grid was attacked, leaving 20% of of Kyiv without power and therefore without heat, light or emergency care? What about the (luckily unsuccessful) attempt to sabotage a Saudi Arabia petro-chemical plant and cause an explosion? There are many more of these events, and they are already a reality. There are no visible explosions yet, which would make it easier to compare them to Pearl Harbor or 9/11, but they are serious and deadly nonetheless. And while natural disasters, road incidents and other issues claim more victims, there isn’t a trivial way to calculate the “return of life on investment”. And isn’t a secure charity for improving hurricane protection in third world nations better than one that gets hacked and all of its funds get stolen?
  • People have not adopted easy security measures because they were minor inconveniences – for example 2-factor authentication has been around for ages, but only recently we began using it. That is true, of course, but the reason for that might not be that it has been mostly fine to not have 2FA so far, but that society hasn’t yet realized the risks. Humans are very bad at intuitively judging risk, especially when they don’t have enough information. Now that we have more information, we are slightly better at estimating that, yes, adding a second factor is important for some systems. Security measures get adopted when we realize the risk, not only when there is more of it. Another reason people have not adopted cybersecurity measures is that they don’t know about them. Because the area is relatively recent, expertise is rare. This discrepancy between the ubiquity of information technology and the lacks of technical expertise (not to mention security expertise) has been an issue for a long time.
  • The digital world plays too small a role in our world when we put things in perspective – humans play a small role in the world if you put them in a big enough perspective, that doesn’t mean we are not important. And the digital world is playing an increasingly important role in our world – we can’t that easily continue to claim that cybersecurity is not important. And overall, the claim that so far everything has been (almost) smooth sailing can’t be transformed into the argument that it is going to be the same, only with gradual improvement over time. If IT is playing an exponentially more important role (and it is), then our focus on information security can’t grow linearly. I know you can’t plot these things on a graph without looking stupid, but you get the gist.
  • We have managed to muddle through without too much focus on cybersecurity – yes, we have. But we will find it increasingly harder to do so. Also, we have successfully muddled through many eras of human history because we have done things wrong (For example the Maya civilization collapsed partly because they handled the the environment wrong). Generally, the fact that something hasn’t gone terribly wrong is a bad argument that we are doing fine. Systemic issues get even more entrenched while on the surface it may look like we are successfully muddling through. I’m not saying that is certainly the case for cybersecurity, but it might very well be.

While arguing with the author’s point is an interesting task, it doesn’t directly prove the point that cybersecurity is indeed important.

First, we don’t have good comparisons of estimates of the cost – to the economy and to human life – of investment in cybersecurity as opposed to other areas, so I don’t think we can claim cybersecurity is not important. There are, for example, estimates of the cost of a data breach, and it averages several million dollars. If you directly and indirectly lose several million dollars with a likelihood of 30% (according to multiple reports), I guess you should invest a few hundred thousands.

Second, it is harder to internalize the risk of incidents in the digital world compared to those in the physical world. While generally bad at evaluating risk, I think the indirection that the digital world brings, contributes negatively to our ability to make risk-based decisions. The complexity of the software complicates things even further – even technical people can’t always imagine the whole complexity of the systems they are working with. So we may not feel cybersecurity is important even though facts and figures show otherwise.

But for me the most important reason for the importance of cybersecurity is that we are currently laying a shaky foundation for our future world. Legacy software, legacy protocols and legacy standards are extremely hard to get rid of once they are ubiquitous. And if they are insecure by design, because they are not built with security in mind, there is no way that software that relies on them can be secure.

If we don’t get cybersecurity right soon, everything that relies on the foundations that we build today will be broken. And no, you can’t simply replace your current set of systems with new, more secure ones. Organizations are stuck with old systems not because they don’t want to get new and better ones, but because it’s hard to do that – it involves migration, user training, making sure all edge cases are covered, informing customers, etc. Protocols and standards are even hard to change – see how long it took for TLS 1.3 to come along, for example. But network standards still have vulnerabilities that don’t have good mitigation (or didn’t have until recently) – take an SS7 attack on a mobile network, or ARP spoofing, or BGP hijacking.

If we don’t agree that cybersecurity is very important, future technology will be based on an insecure layer that it will try to fix with clumsy abstractions. And then at some point everything may collapse, at a moment when we are so dependent on it, that the collapse will be a major disruption in he way humanity operates. That may sound futuristic, but with technology you have no option but to be futuristic. We must build systems today that will withstand the test of time. And this is already very hard – maybe because we didn’t think cybersecurity is important enough.

I’m not saying we should pour millions into cybersecurity starting tomorrow. But I’d be happy to see a security mindset in everyone that works with technology as well as in everyone that takes decisions that involve technology. Not paranoid, but security conscious. Not “100% secure or bust”, but taking all known protection measures.

Cybersecurity is important. And it will be even more important in he upcoming decades.

The post Cybersecurity Is Very Important appeared first on Bozho's tech blog.

Protecting JavaScript Files (From Magecart-Style Attacks)

Post Syndicated from Bozho original https://techblog.bozho.net/protecting-javascript-files-from-magecart-attacks/

Most web pages now consist of multiple JavaScript files that are included in various ways (via >script< or in some more dynamic fashion, bundled and minified or not). But since these scripts interact with everything on the page, they can be a security risk.

And Magecart showcased that risk – the group attacked multiple websites, including British Airways and Ticketmaster, and stole a few hundred thousand credit card numbers.

It is a simple attack where attacker inserts a malicious javascript snippet into a trusted javascript file, collects credit card details entered into payment forms and sends them to an attacker-owned website. Obviously, the easy part is writing the malicious javascript; the hard part is getting it on the target website.

Many websites rely on externally hosted assets (including scripts) – be it a CDN, or a dedicated asset server (as in the case of British Airways). These externally hosted assets may be vulnerable in several ways:

  • Asset servers may be less protected than the actual server, because they are just static assets, what could go wrong?
  • Credentials to access CDN configuration may be leaked which can lead to an attacker replacing the original source scripts with their own
  • Man-in-the-middle attacks are possible if the asset server is misconfigured (e.g. allowing TLS downgrade attack)
  • The external service (e.g. CND) that was previously trusted can go rogue – that’s unlikely with big providers, but smaller and cheaper ones are less predictable

Once the attackers have replaced the script, they are silently collecting data until they are caught. And this can be a long time.

So how to protect against those attacks? A typical advice is to introduce a content security policy, which will allow scripts from untrusted domains to be executed. This is a good idea, but doesn’t help in the scenario where a trusted domain is compromised. There are several main approaches, and I’ll summarize them below:

  • Subresource integrity – this is a browser feature that lets you specify the hash of a script file and validates that hash when the page loads. If it doesn’t match the hash of the actually loaded script, the script is blocked. This sounds great, but has several practical implications. First, it means you need to complicate your build pipeline so that it calculates the hashes of minified and bundled resources and inject those hashes in the page templates. It’s a tedious process, but it’s doable. Then there are the dynamically loaded scripts where you can’t use this feature, and there are the browsers that don’t support it fully (Edge, IE and Safari on mobile). And finally, if you don’t have a good build pipeline (which many small websites don’t), a very small legitimate change in the script can break your entire website.
  • Don’t use external services – that sounds straightforward but it isn’t always. CDNs exist for a reason and optimize your site loading speeds and therefore ranking, internal policies may require using a dedicated asset server, sometimes plugins (e.g. for WordPress) may fetch external resources. An exception to this rule is allowed if you somehow sandbox the third party script (e.g. via iframe as explained in the link above)
  • Secure all external servers properly – if you can do that, that’s great – upgrade the supported cipher suites, monitor for 0days, use only highly trusted CDNs. Regardless of anything, you should obviously always strive to do that. But it requires expertise and resources, which may not be available to every company and every team.

There is one more scenario that may sound strange – if an attacker hacks into your main application server(s), they can replace the scripts with whatever they want. It sounds strange at first, because if they have access to the server, it’s game over anyway. But it’s not always full access with RCE – might be a limited access. Credit card numbers are usually not stored in plain text in the database, so having access to the application server may not mean you have access to the credit card numbers. And changing the custom backend code to collect the data is much more unpredictable and time-consuming than just replacing the scripts with malicious ones. None of the options above protect against that (as in this case the attacker may be able to change the expected hash for the subresource integrity check)

Because of the limitations of the above approaches, at my company we decided to provide a tool to monitor your website for such attacks. It’s called Scriptinel.com (short for Script Sentinel) and is currently in early beta. It’s mainly targeted at small website owners who can’t get any of the above 3 points, but can be used for sophisticated websites as well.

What it does is straightforward – it scans a given URL, extracts all scripts from it (even the dynamic ones), and starts monitoring them for changes with periodic requests. If it discovers a change, it notifies the website owner so that they can react.

This means that the attacker may have a few minutes to collect data, but time is an important factor here – this is not a “SELECT *” data breach; it relies on customers using the website. So a few minutes minimizes the damage. And it doesn’t break your website (I guess we can have a script to include that blocks the page if scriptinel has found discrepancies). It also doesn’t require changes in the build process to include hashes. Of course, such a reactive approach is not perfect, especially if there is nobody to react, but monitoring is a good idea regardless of whether other approaches are used.

There is the issue of protected pages and pages that are not directly accessible via a GET request – e.g. a payment page. For that you can enter your javascript files individually, rather than having the tool scan the page. We can add a more sophisticated user journey scan, with specifying credentials and steps to reach the protected pages, but for now that seems unnecessary.

How does it solve the “main server compromised” problem? Well, nothing solves that perfectly, as the attacker can do changes that serve the legitimate version of the script to your monitoring servers (identifying them by IP) and the modified scripts to everyone else. This can be done on the compromised external asset servers as well (though not with leaked CDN credentials). However this implies the attacker knows that Scriptinel is used, knows the IP addresses of our scanners, and has gained sufficient control to server different versions based on IP. This raises the bar significantly, and can even be made impossible to pull off if we regularly change the IP addresses in a significantly large IP range.

Such functionality may be available in some enterprise security suites, though I’m not aware of it (if it exists somewhere, please let me know).

Overall, the problem is niche, but tough, and not solving it can lead to serious data breaches even if your database is perfectly protected. Scriptinel is a simple to use, good enough solution (and one that’s arguably better than the other options).

Good information security is the right combination of knowledge, implementation of best practices and tools to help you with that. And I maybe Scriptinel is one such tool.

The post Protecting JavaScript Files (From Magecart-Style Attacks) appeared first on Bozho's tech blog.

Let’s Annotate Our Methods With The Features They Implement

Post Syndicated from Bozho original https://techblog.bozho.net/lets-annotate-our-methods-with-the-features-they-implement/

Writing software consists of very little actual “writing”, and much more thinking, designing, reading, “digging”, analyzing, debugging, refactoring, aligning and meeting others.

The reading and digging part is where you try to understand what has been implemented before, why it has been implemented, and how it works. In larger projects it becomes increasingly hard to find what is happening and why – there are so many classes that interfere, and so many methods participate in implementing a particular feature.

That’s probably because there is a mismatch between the programming units (classes, methods) and the business logic units (features). Product owners want a “password reset” feature, and they don’t care if it’s done using framework configuration, custom code split in three classes, or one monolithic controller method that does that job.

This mismatch is partially addressed by the so called BDD (behaviour driven development), as business people can define scenarios in a formalized language (although they rarely do, it’s still up to the QAs or developers to write the tests). But having your tests organized around features and behaviours doesn’t mean the code is, and BDD doesn’t help in making your way through the codebase in search of why and how something is implemented.

Another issue is linking a piece of code to the issue tracking system. Source control conventions and hooks allow for setting the issue tracker number as part of the commit, and then when browsing the code, you can annotate the file and see the issue number. However, due the the many changes, even a very strict team will end up methods that are related to multiple issues and you can’t easily tell which is the proper one.

Yet another issue with the lack of a “feature” unit in programming languages is that you can’t trivially reuse existing projects to start a new one. We’ve all been there – you have a similar project and you want to get a skeleton to get thing running faster. And while there are many tools to help that (Spring Boot, Spring Roo, and other scaffolding utilities), they can rarely deliver what you need – you always have to tweak something, delete something, customize some configuration, as defaults are almost never practical.

And I have a simple proposal that will help with the issues above. As with any complex problem, simple ideas don’t solve everything, but are at least a step forward.

The proposal is in the title – let’s annotate our methods with the features they implement. Let’s have @Feature(name = "Forgotten password", issueTrackerCode="PROJ-123"). A method can implement multiple features, but that is generally discouraged by best practices (e.g. the single responsibility principle). The granularity of “feature” is something that has to be determined by each team and is the tricky part – sometimes an epic describes a feature, sometimes individual stories or even subtasks do. A definition of a feature should be agreed upon and every new team member should be told what to do and how to interpret it.

There is of course a lot of complexity, e.g. for generic methods like DAO methods, utility methods, or methods that are reused in too many places. But they also represent features, it’s just that these features are horizontal. “Data access layer” is a feature – a more technical one indeed, but it counts, and maybe deserves a story in the issue tracker.

Your features can actually be listed in one or several enums, grouped by type – business, horizontal, performance, etc. That way you can even compose features – e.g. account creation contains the logic itself, database access, a security layer.

How does such a proposal help?

  • Consciousnesses about the single responsibility of methods and that code should be readable
  • Provides a rationale for the existence of each method. Even if a proper comment is missing, the annotation will put a method (or a class) in context
  • Helps navigating code and fixing issues (if you can see all places where a feature is implemented, you are more likely to spot an issue)
  • Allows tools to analyze your features – amount, complexity, how chaotic a feature is spread across the code base, test coverage per feature, etc.
  • Allows tools to use existing projects for scaffolding for new ones – you specify the features you want to have, and they are automatically copied

At this point I’m supposed to give a link to a GitHub project for a feature annotation library. But it doesn’t make sense to have a single-annotation project. It can easily be part of guava or something similar Or can be manually created in each project. The complex part – the tools that will do the scanning and analysis, deserve separate projects, but unfortunately I don’t have time to write one.

But even without the tools, the concept of annotating methods with their high-level features is I think a useful one. Instead of trying to deduce why is this method here and what requirements does it have to implement (and were all necessary tests written at the time), such an annotation can come handy.

The post Let’s Annotate Our Methods With The Features They Implement appeared first on Bozho's tech blog.

Idea: A Generic P2P Network Client

Post Syndicated from Bozho original https://techblog.bozho.net/idea-a-generic-p2p-network-client/

Every now and then one has a half-baked idea about some project that they aren’t likely to be able to do because of lack of time. I’ve written about such random app ideas before, but they were mostly about small apps.

Here I’d like to share an idea for something a bit bigger (and therefor harder to spare time for) – a generic P2P network client. P2P networks are popular in various domains, most notably file sharing and cryptocurrencies. However, in theory they can be applied to many more problems, social networks, search engines, ride sharing, distributed AI, etc. All of these examples have been implemented in p2p context, and they even work okay, but they lack popularity.

The popularity is actually the biggest issue with these applications – in order to get a service to be popular, in many cases you need a network effect – a p2p file sharing with 100 users doesn’t benefit from being p2p. A social network with 100 users is, well, useless. And it is hard to get traction with these p2p services because they require an additional step – installing software. You can’t just open a webpage and register, you have to install some custom software that will be used to join the p2p network.

P2P networks are distributed, i.e. there is no central node that has control over what happens. That control is held over the binary that gets installed – and it’s usually open source. And you need that binary in order to establish an overlay network. These networks reuse the internet’s transport layer, but do not rely on the world wide web standards, and most importantly, don’t rely heavily on DNS (except, they actually do when run for the first time in order to find a few known seed nodes). So once you are connected to the network, you don’t need to make HTTP or DNS queries, everything stays in the specifics of the particular protocol (e.g. bittorrent).

But the fact that not only you have to trust and install some piece of software, you have to be part of the network and exchange data regularly with peers. So it really doesn’t scale if you want to be part of dozens of p2p networks – each of them might be hungry for resources, you have to keep dozens of applications running all the time (and launching on startup).

So here’s the idea – why don’t we have a generic p2p client. A software that establishes the link to other peers and is agnostic on what data is going to be transferred. From what I’ve seen, the p2p layer is pretty similar in different products – you try to find peers in your immediate network, if none are found, you connect to a known seed node (first by DNS which uses DNS round-robin and then by a list of hardcoded IPs), and when you connect the seed node gives you a list of peers to connect to. Each of those peers has knowledge of other peers, so you can quickly be connected to a significant number of peer nodes (I’m obviously simplifying the flow, but that’s roughly how it works).

Once you have an established list of peers, you start doing the application-specific stuff – sharing files, downloading a cryptocurrency ledger, sharing search indexes, sharing a social network profile database, etc. But the p2p network part can be, I think, generalized.

So we can have that generic client and make it pluggable – every application developer can write their own application ontop of it. The client will not only be a single point for the user, but can also manage resources automatically – inbound and outbound traffic, CPU/GPU usage (e.g. in case of cryptocurrency mining). And all of these applications (i.e. plugins) can either be installed by downloading them from the vendor’s website (which is somewhat similar to the original problem), or can be downloaded from a marketplace available within the client itself. That would obviously mean a centralized marketplace, unless the marketplace itself is a p2p application that’s built-in the client.

So, effectively, you’d be able to plug-in your next file sharing solution, you next cryptocurrency, encrypted messaging, or your next distributed social network. And your users won’t have the barrier of installing yet another desktop app. That alone does not solve the network effect, as you still need enough users to add your plugin to their client (and for many to have the client to begin with), but it certainly makes it easier for them. Imagine if we didn’t have Android and Apple app stores and we had to find relevant apps by other means.

This generic client can possibly be even a browser plugin, so that it’s always on when you are online. It doesn’t have to be, but it might ease adoption. And while it will be complicated to write it as a plugin, it’s certainly possible – there are already p2p solutions working as browser plugins.

Actually, many people think of blockchain smart contracts as a way to do exactly that – have distributed applications. But they have an unnecessary limitation – they work on data that’s shared on a blockchain. And in some cases you don’t need that. You don’t need consensus in the cryptocurrency sense. For example in file sharing, all you need to do is compute the hash of the file (and of its chunks) and start sending them to interested peers. No need to store the file on the blockchain. Same with instant messaging – you don’t need to store the message on a shared immutable database, you only need to send it to the recipients. So smart contracts are not as generic solution as what I’m proposing.

Whether a generic client can accommodate an unlimited amount of different protocols and use cases, how would the communication protocol look like, what programming languages it should support and what security implications that has for the client (e.g. what’s the sandbox that the client provides), what UI markup will be used, are all important operational details, but are besides the point of this post.

You may wonder whether there isn’t anything similar done already. Well, I couldn’t find one. But there’s a lot done that can support such a project: Telehash (a mesh protocol), JXTA (a p2p protocol) and its Chaupal implementation, libp2p and Chimera (p2p libraries), Kademlia (a distributed hash table).

Whether such a project is feasible – certainly. Whether its adoption is inevitable – not so certainly, as it would require immediate usefulness in order to be installed in the first place. So, as every “platform” it will face a chicken-and-egg problem – will people install it if there are no useful plugins, and will people write plugins if there are no user installations. That is solvable in a number of ways (e.g. paying developers initially to write plugins, bundling some standard applications (e.g. file sharing and instant messaging). It could be a business opportunity (monetized through the marketplace or subscriptions) as well as a community project.

I’m just sharing the idea, hoping that someone with more time and more knowledge of distributed networks might pick it up and make the best of it. If not, well, it’s always nice to think about what can the future of the internet look like. Centralization is inevitable, so I don’t see p2p getting rid of centralized services anytime soon (or ever), but some things arguably work better and safer when truly decentralized.

The post Idea: A Generic P2P Network Client appeared first on Bozho's tech blog.

First Thoughts About Facebook’s Libra Cryptocurrency

Post Syndicated from Bozho original https://techblog.bozho.net/first-thoughts-about-facebooks-libra-cryptocurrency/

Facebook announced today that by 2020 they will roll out Libra – their blockchain-based cryptocurrency. It is, of course, major news, as it has the potential to disrupt the payment and banking sector. If you want to read all the surrounding newsworthy details, you can read the TechCrunch article. I will instead focus on a few observations and thoughts about Libra – from a few perspectives – technical, legal/compliance, and possibly financial.

First, replacing banks and bank transfers and credits cards and payment providers and ATMs with just your smartphone sounds appealing. Why hasn’t anyone tried to do that so far – well, many have tried, but you can’t just have the technology and move towards gradual adoption. You can’t even do it if you are Facebook. You can, however, do it, if you are Facebook, backed by Visa, Mastercard, Uber, and many, many more big names on the market. So Facebook got that right – they made a huge coalition that can drive such a drastic change forward.

I have several reservations, though. And I’ll go through them one by one.

  • There is not much completed – there is a website and a technical paper and an open-source prototype. It’s not anywhere near production. The authors of the paper write that the Move language is still being designed (and it’s not that existing move language). The opensource prototype is still a prototype (and one that’s a bit hard to read, thought that might be because of the choice of Rust). This means they will work with tight schedules to make this global payment system operational. The technical paper is a bit confusing hard to follow. Maybe they rushed it out and didn’t have time to polish it, and maybe it’s just a beginning of a series of papers. Or maybe it’s my headache and the paper is fine.
  • The throughput sounds insufficient – the authors of the paper claim that they don’t have any performance results yet, but they expect to support around 1000 transactions per second. This section of the paper assumes that state channels will be used and most transactions won’t happen on-chain, but that’s a questionable assumption, as the use of state channels is not universally applicable. And 1000 TPS is too low compared to the current Visa + mastercard transactions (estimated to be around 5000 per second). Add bank transfers, western union, payment providers and you get a much higher number. If from the start you are presumably capped at 1000, would it really scale to become a global payment network? Optimizing throughput is not something that just happens – other cryptocurrencies have struggled with that for years.
  • Single merkle tree – I’m curious whether that will work well in practice. A global merkle tree that is being concurrently and deterministically updated is not an easy task (ordering matters, timestamps are tricky). Current blockchain implementations use one merkle tree per block and it’s being constructed based on all transactions that fall within that block when the block is “completed”. In Libra there will be no blocks (so what does “blockchain” mean anyway), so the merkle tree will always grow (similar to the certificate transparency log).
  • How to get money in? – Facebook advertises Libra as a way for people without bank accounts to do payments online. But it’s not clear how they will get Libra tokens in the first place. In third world countries, consumers’ interaction is with the telecoms where they get their cheap smartphones and mobile data cards. The realistic way for people to get Libra tokens would be if Facebook had contracts with Telecoms, but how and whether that is going to be arranged, is an open question.
  • Key management – key management has always been, and will probably always be a usability issue for using blockchain by end users. Users don’t want to to manage their keys, and they can’t do that well. Yes, there are key management services and there are seed phrases, but that’s still not as good as getting a centralized account that can be restored if you lose your password. And if you lose your keys, or your password (from which a key is derived to encrypt your keys and store them in the cloud), then you can lose access to your account forever. That’s a real risk and it’s highly undesirable for a payment system
  • Privacy – Facebook claims it won’t associate Libra accounts with facebook accounts in order to target ads. And even if we can trust them, there’s an inherent privacy concern – all payments will be visible to anyone who has access to the ledger. Yes, a person can have multiple accounts, but people will likely only use one. It’s not a matter of what can be done by tech savvy users, but what will be done in reality. And ultimately, someone will have to know who those addresses (public keys) belong to (see the next point). A zcash-basedh system would be privacy preserving, but it can’t be compliant
  • Regulatory compliance – if you are making a global payment system, you’d have to make regulators around the world, especially the SEC, ESMA, ECB. That doesn’t mean you can’t be distruptive, but you have to take into account real world problems, like KYC (know your customer) processes as well as other financial limitations (you can’t print money after all). Implementing KYC means people being able to prove who they are before they can open an account. This isn’t rocket science, but it isn’t trivial either, especially in third world countries. And money launderers are extremely inventive, so if Facebook (or the Libra association) doesn’t get its processes right, it risks becoming very lucrative for criminals and regime officials around the world. Facebook is continuously failing to stop fake news and disinformation on the social network despite the many urges from the public and legislators, I’m not sure we can trust them with being “a global compliance officer”. Yes, partners like Visa and Mastercard can probably help here, but I’m not sure how involved they’ll be.
  • Oligarchy – while it has “blockchain” in the tagline, it’s not a democratizing solution. It’s a private blockchain based on trusted entities that can participate in it. If it’s going to be a global payment network, it will be a de-facto payment oligarchy. How is that worse from what we have now? Well now at least banks are somewhat independent power players. The Libra association is one player with one goal. And then antitrust cases will follow (as they probably will immediately, as Facebook has long banned cryptocurrency ads on the website only to unveil its own)

Facebook has the power to make e-commerce website accept payments in Libra. But will consumers want it. Is it at least 10 times better than using your credit card, or is it marginally better so that nobody is bothered to learn yet another thing (and manage their keys). Will third world countries really be able benefit from such a payment system, or others will prove more practical (e.g. the m-pesa)? Will the technology be good enough or scalable enough? Will regulators allow it, or will Facebook be able to escape their graps with a few large fines until they’ve conquered the market?

I don’t know the answers, but I’m skeptical about changing the money industry that easily. And given Facebook’s many privacy blunders, I’m not sure they will be able to cope well in a much more scrutinized domain. “Move fast and break things” and easily become “Move fast and sponsor terrorism”, and that’s only partly a joke.

I hope we get fast and easy digital payments, as bank transfers and even credit cards feel too outdated. But I’m sure it’s not easy to meet the complex requirements of reality.

The post First Thoughts About Facebook’s Libra Cryptocurrency appeared first on Bozho's tech blog.