Tag Archives: Developer tips

Certificate Transparency Verification in Java

Post Syndicated from Bozho original https://techblog.bozho.net/certificate-transparency-verification-in-java/

So I had this naive idea that it would be easy to do certificate transparency verification as part of each request in addition to certificate validity checks (in Java).

With half of the weekend sacrificed, I can attest it’s not that trivial. But what is certificate transparency? In short – it’s a publicly available log of all TLS certificates in the world (which are still called SSL certificates even though SSL is obsolete). You can check if a log is published in that log and if it’s not, then something is suspicious, as CAs have to push all of their issued certificates to the log. There are other use-cases, for example registering for notifications for new certificates for your domains to detect potentially hijacked DNS admin panels or CAs (Facebook offers such a tool for free).

What I wanted to do is the former – make each request from a Java application verify the other side’s certificate in the certificate transparency log. It seems that this is not available out of the box (if it is, I couldn’t find it. In one discussion about JEP 244 it seems that the TLS extension related to certificate transparency was discussed, but I couldn’t find whether it’s supported in the end).

I started by thinking you could simply get the certificate, and check its inclusion in the log by the fingerprint of the certificate. That would’ve been too easy – the logs to allow for checking by hash, however it’s not the fingerprint of a certificate, but instead a signed certificate timestamp – a signature issued by the log prior to inclusion. To quote the CT RFC:

The SCT (signed certificate timestamp) is the log’s promise to incorporate the certificate in the Merkle Tree

A merkle tree is a very cool data structure that allows external actors to be convinced that something is within the log by providing an “inclusion proof” which is much shorter than the whole log (thus saving a lot of bandwidth). In fact the coolness of merkle trees is why I was interested in certificate transparency in the first place (as we use merkle trees in my current log-oriented company)

So, in order to check for inclusion, you have to somehow obtain the SCT. I initially thought it would be possible with the Certificate Transparency Java library, but you can’t. Once you have it, you can use the client to check it in the log, but obtaining it is harder. (Note: for server-side verification it’s fine to query the log via HTTP; browsers, however, use DNS queries in order to preserve the anonymity of users).

Obtaining the SCT can be done in three ways, depending on what the server and/or log and/or CA have chosen to support: the SCT can be included in the certificate, or it can be provided as a TLS extension during the TLS handshake, or can be included in the TLS stapling response, again during the handshake. Unfortunately, the few certificates that I checked didn’t have the SCT stored within them, so I had to go to a lower level and debug the TLS handshake.

I enabled TLS hadnshake verbose output, and lo and behold – there was nothing there. Google does include SCTs as a TLS extension (according to Qualys), but the Java output didn’t say anything about it.

Fortunately (?) Google has released Conscrypt – a Java security provider based Google’s fork of OpenSSL. Things started to get messy…but I went for it, included Conscrypt and registered it as a security provider. I had to make a connection using the Conscrypt TrustManager (initialized with all the trusted certs in the JDK):

KeyStore trustStore = KeyStore.getInstance("JKS");
trustStore.load(new FileInputStream(System.getenv("JAVA_HOME") + "/lib/security/cacerts"), "changeit".toCharArray());
ctx.init(null,new TrustManager[] {new TrustManagerImpl(trustStore, 
    null, null, null, logStore, null, 
    new StrictCTPolicy())}, new SecureRandom());
        

URL url = new URL("https://google.com");
HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
conn.setSSLSocketFactory(ctx.getSocketFactory());
conn.connect();
conn.getInputStream();
conn.disconnect();

And of course it didn’t work initially, because Conscrypt doesn’t provide implementations of some core interfaces needed – the CTLogStore and CTPolicy classes. The CTLogStore actually is the important bit that holds information about all the known logs (I still find it odd to call a “log provider” simply “log”, but that’s the accepted terminology). There is a list of known logs, in JSON form, which is cool, except it took me a while to figure (with external help) what are exactly those public keys. What are they – RSA, ECC? How are they encoded? You can’t find that in the RFC, nor in the documentation. It can be seen here that it’s ” DER encoding of the SubjectPublicKeyInfo ASN.1 structure “. Ugh.

BouncyCastle to the rescue. My relationship with BouncyCastle is a love-hate one. I hate how unintuitive it is and how convoluted its APIs are, but I love that it has (almost) everything cryptography-related that you may ever need. After some time wasted with trying to figure how exactly to get that public key converted to a PublicKey object, I found that using PublicKeyFactory.createKey(Base64.getDecoder().decode(base64Key)); gives you the parameters of whatever algorithm is used – it can return Elliptic curve key parameters or RSA key parameters. You just have to then wrap them in another class and pass them to another factory (typical BouncyCastle), and hurray, you have the public key.

Of course now Google’s Conscrypt didn’t work again, because after the transformations the publicKey’s encoded version was not identical to the original bytes, and so the log ID calculation was wrong. But I fixed that by some reflection, and finally, it worked – the certificate transparency log was queried and the certificate was shown to be valid and properly included in the log.

The whole code can be found here. And yes, it uses several security providers, some odd BouncyCastle APIs and some simple implementations that are missing in Google’s provider. Known certificates may be cached so that repeated calls to the log are not performed, but that’s beyond the scope of my experiment.

Certificate transparency seems like a thing that’s core to the internet nowadays. And yet, it’s so obscure and hard to work with.

Why the type of public key in the list is not documented (they should at least put an OID next to the public key, because as it turns out, not all logs use elliptic curves – two of them use RSA). Probably there’s a good explanation, but why include the SCT in the log rather than the fingerprint of the certificate? Why not then mandate inclusion of the SCT in the certificate, which would require no additional configuration of the servers and clients, as opposed to including it in the TLS handshake, which does require upgrades?

As far as I know, the certificate transparency initiative is now facing scalability issues because of the millions of Let’s encrypt certificates out there. Every log (provider) should serve the whole log to everyone that requests it. It is not a trivial thing to solve, and efforts are being put in that direction, but no obvious solution is available at the moment.

And finally, if Java doesn’t have an easy way to do that, with all the crypto libraries available, I wonder what’ the case for other languages. Do they support certificate transparency or they need upgrades?

And maybe we’re all good because browsers supports it, but browsers are not the only thing that makes HTTP requests. API calls are a massive use-case and if they can be hijacked, the damage can be even bigger than individual users being phished. So I think more effort should be put in two things:
1. improving the RFC and 2. improving the programming ecosystem. I hope this post contributes at least a little bit.

The post Certificate Transparency Verification in Java appeared first on Bozho's tech blog.

Integrating Applications As Heroku Add-Ons

Post Syndicated from Bozho original https://techblog.bozho.net/integrating-applications-as-heroku-add-ons/

Heroku is a popular Platform-as-a-Service provider and it offers vendors the option to be provided as add-ons. Add-ons can be used by Heroku customers in different ways, but a typical scenario would be “Start a database”, “Start an MQ”, or “Start a logging solution”. After you add the add-on to your account, you can connect to the chosen database, MQ, logging solution or whatever.

Integrating as Heroku add-on is allegedly simple, and Heroku provides good documentation on how to do it. However, there are some pitfalls and so I’d like to share my experience in providing our services (Sentinel Trails and SentinelDB) as Heroku add-ons.

Both are SaaS (one is a logging solution, the other one – a cloud datastore), and so when a Heroku customer wants to add it to their account, we have to just create an account for them on our end.

In order to integrate with Heroku, you need to implement several endpoints:

  • provisioning – the initial creation of the resources (= account)
  • plan change – since Heroku supports multiple subscription plans, this should also be reflected on your end
  • deprovisioning – if a user stops using your service, you may want to free some resources
  • SSO – allows users to log in your service by clicking an icon in the Heroku console.

Implementing these endpoints following the tutorial should be straightforward, but it isn’t exactly. Hence I’m sharing our Spring MVC controller that handles it – you can check it here.

A few important bits:

  • You may choose not to obtain a token if you don’t plan to interact with the Heroku API further.
  • We are registering the user with a fake email in the form of <resourceId>@heroku.com. However, you may choose to use the token to fetch the emails of team members and collaborators, as described here.
  • The most important piece of data is the resource_id – store it in your users (or organizations) table and consider adding an index to be able to retrieve records by it quickly.
  • Return your keys and secrets as part of the provisioning request. They will be set as environment variables in Heroku
  • All of the requests are made from the Heroku servers to your server directly, except the SSO call. It is invoked in the browsers and so you should set the session cookie/token in the response. That way the user will be logged in your service.
  • When you generate your addon manifest, make sure you update the endpoint URLs

After you’re done, the alpha version appears in the marketplace (e.g. here and here). You should then have some alpha users to test the add-ons before it can be visible in the marketplace.

Integrating SaaS solutions with existing cloud providers is a good thing, and I’m happy that Heroku provides an automated way to do that. (AWS, for example, also has a marketplace, but integration there feels a bit strange and unpolished (I’ve hit some issues that were manually resolved by the AWS team).

Since many companies are choosing IaaS or PaaS for their services, having the ability to easily integrate an add-on service is very useful. I’d even go further and propose some level standardization for cloud add-ons, but I guess time will tell if we really need it, or we can spare a few days per provider.

The post Integrating Applications As Heroku Add-Ons appeared first on Bozho's tech blog.

Types of Data Breaches and How To Prevent Them

Post Syndicated from Bozho original https://techblog.bozho.net/types-of-data-breaches-and-how-to-prevent-them/

Data breaches happen practically every day. Personal, including financial and medical data leak to cyber criminals as well as intelligence agencies. Some notable breaches include the Equifax breach, where dozens of personal data fields were leaked, and the recent Marriott breach, where passports, credit cards and locations of people at a given time were breached.

I’ve been doing some data protection consultancy as well as working on a data protection product and decided to classify the types of data breaches and give recommendations on how they can be addressed. We don’t always get to know how exactly the breaches happen, but from what is published in news articles and post-mortems, we can have a good overview on the breach landscape.

Control over target server – if an attacker is able to connect to a target server and gains full or partial control on it, they can do anything, including running SELECT * FROM ... , copying files, etc. How do attackers gain such control? In many ways, most notably RCE (remote code execution) vulnerabilities and weak admin authentication.

How to prevent it? Follow best security practices – regularly update libraries and software to get security patches, do not run native commands from within the application layer, open only necessary ports (80 and 443) to the outside world, configure 2-factor authentication for administrator login. Aim at having an intrusion detection / prevention system. Encrypt your data, and make the encryption as granular as possible for the most sensitive data (e.g. for SentinelDB we utilize per-record encryption) to avoid SELECT * breaches.

SQL injections – this is a rookie mistake that unfortunately still happens. It allows attackers to manipulate your SQL queries and inject custom bits in them that allows them to extract more data than they are supposed to.

How to prevent it? Use prepared statements for your queries. Never ever concatenate user input in order to construct queries. Run regular code reviews and use code inspection tools to catch such instances.

Unencrypted backups – the main system may be well protected, but attackers are usually after the weak spots. Storing backups might be such – if you store unencrypted backups that are accessible via weak authentication (e.g. over FTP via username/password), then someone may try to attack this weaker spot. Even if the backup is encrypted, the key can be placed alongside it, which makes the encryption practically useless.

How to prevent it? Encrypt you backups, store them in a way that’s as strongly protected as your servers (e.g. 2FA, internal-network/VPN only), and have your decryption key in a hardware security module (or equivalent, e.g. AWS KMS).

Personal data in logs – another weak spot other than the backups may be your logs. They usually lie on separate servers, and are not as well guarded. That’s usually okay, since logs don’t contain personal information, but sometimes they do. I recently stumbled upon a large company’s website that had their directory structure unprotected and they kept their access logs files alongside their static resources. In addition to that, they passed personal information as GET parameters, so you could get a lot of information by just getting the access logs. Needless to say, I did a responsible disclosure and the issue was fixed, but it was a potential breach.

How to prevent it? Don’t store personal information in logs. Avoid submitting forms with a GET method. Regularly review the code to check whether personal data is not logged. Make sure your logs are stored in a way as protected as your production servers and your backups. It could be a cloud service, it could be a local installation of an open source package, but don’t overlook the security of the log collection system.

Data pushed to unprotected storage – a recent Alteryx/Experian leak was just that – data placed on a (somewhat) public S3 bucket was breached. If you place personal data in weakly protected public stores (AWS S3, file sharing services, FTPs), then you are waiting for trouble to happen.

How to prevent it? Don’t put personal data publicly. How to prevent that from happening – always review your S3 buckets and FTP servers policies. Have internal procedures that disallow sharing personal data without protecting it with at least a password shared by a side-channel (messenger/sms).

Unrestricted API calls – that’s what caused the Facebook-Cambridge Analytics issue. No matter how secure your servers are, if you expose the data through your API without access restriction, rate-limiting, fraud-detection, audit trail, then your security is no use – someone will “scrape” your data through the API.

How to prevent it? Do not expose too much personal data over public or easily accessible APIs. Vet API users and inform your users whenever their data is being shared with third parties, via API or otherwise.

Internal actor – all of the woes above can happen due to poor security or due to internal actors. Even if your network is well guarded, an admin can go rogue and leak the data. For many reasons, nonincluding financial. An privileged internal actor has access to perform SELECT *, can decrypt the backups, can pretend to be a trusted API partner.

How to prevent it? Good operational security. A single sentence like that may sound easy, but it’s not. I don’t have a full list of things that have to be in place to guard against internal breaches – there are technical, organizational and legal measures to be taken. Have unmodifiable audit trail. Have your Intrusion prevention system (or logging solution) also detect anomalous internal behaviour. Have procedures that require two admins to work together in order to log in (e.g. split key) to the most. If the data is sensitive, do background checks on the privileged admins. And many more things that fall into the “operational security” umbrella.

Man-in-the-middle attacks – MITM can be used to extract data from active users only. It works on website without HTTPS, or in case the attacker has somehow installed a wildcard certificate on the target machine (and before you say that’s too unlikely – it happens way too often to be ignored). In case of a successful MITM attack, the attacker can extract all data that’s being transferred.

How to prevent it? First – use HTTPS. Always. Redirect HTTP to HTTPS. Use HSTS. Use certificate pinning if you control the updates of the application (e.g. through an app store). The root certificate attack unfortunately cannot be circumvented. Sorry, just hope that your users haven’t installed such shitty software. Fortunately, this won’t lead to massive breaches, only data of active users that are being targeted may leak.

JavaScript injection / XSS – if somehow an attacker can inject javascript into your website, they can collect data being entered. This is what happened in the recent British Airways breach. A remember a potential attack on NSW (Australia) elections, where the piwick analytics script was loaded from an external server that was vulnerable to a TLS downgrade attack which allowed an attacker to replace the script and thus interfere with the election registration website.

How to prevent it? Follow the XSS protection cheat sheet by OWASP. Don’t include scripts from dodgy third party domains. Make sure third party domains, including CDNs, have a good security level (e.g. run Qualys SSL test).

Leaked passwords from other websites – one of the issues with incorrect storage of passwords is password reuse. Even if you store passwords properly, a random online store may not and if your users use the same email and password there, an attacker may try to steal their data from your site. Not all accounts will be compromised, but the more popular your service is, the more accounts will be affected.

How to avoid it? There’s not much you can do to make other websites store passwords correctly. But you can encourage the use of pass phrases , you can encourage 2-factor authentication in case of sensitive data, or you can avoid having passwords at all and use an external OAuth/OpenID provider (this has its own issues, but they may be smaller than those of password reuse). Also have some rate-limiting in place so that a single IP (or an IP range) is not able to try and access many accounts consecutively.

Employees sending emails with unprotected excel sheets – especially non-technical organizations and non-technical employees tend to just want to get their job done, so they may send large excel sheets with personal data to colleagues or partners in other companies. Then once someone’s email account or server is breached, the data gets breached as well.

How to prevent it? Have internal procedures against sending personal data in excel sheets, or at least have people zip them and send passwords through a side channel (messenger/sms). You can have an organization-wide software that scans outgoing emails for attachments with excel sheets that contain personal data and have these email blocked.


Data breaches are prevented by having good information security. And information security is hard. And it’s the right combination of security practices and security products that minimize the risk of incidents. Many organizations choose not to focus on infosec, as it’s not their core business or they estimate that the risk is worth it, viewing breaches, internal actors manipulating data and other incidents as something that can’t happen to them. Until it happens.

The post Types of Data Breaches and How To Prevent Them appeared first on Bozho's tech blog.

Resources on Distributed Hash Tables

Post Syndicated from Bozho original https://techblog.bozho.net/resources-on-distributed-hash-tables/

Distributed p2p technologies have always been fascinating to me. Bittorrent is cool not because you can download pirated content for free, but because it’s an amazing piece of technology.

At some point I read and researched a lot about how DHTs (distributed hash tables) work. DHTs are not part of the original bittorrent protocol, but after trackers were increasingly under threat to be closed for copyright infringment, “trackerless” features were added to the protocol. A DHT is distributed among all peers and holds information about which peer holds what data. Once you are connected to a peer, you can query it for their knowledge on who has what.

During my research (which was with no particular purpose) I took a note on many resources that I thought useful for understanding how DHTs work and possibly implementing something ontop of them in the future. In fact, a DHT is a “shared database”, “just like” a blockchain. You can’t trust it as much, but proving digital events does not require a blockchain anyway. My point here is – there is a lot more cool stuff to distributed / p2p systems than blockchain. And maybe way more practical stuff.

It’s important to note that the DHT used in BitTorrent is Kademlia. You’ll see a lot about it below.

Anyway, the point of this post is to share the resources that I collected. For my own reference and for everyone who wants to start somewhere on the topic of DHTs.

I hope the list is interesting and useful. It’s not trivial to think of other uses of DHTs, but simply knowing about them and how they work is a good thing.

The post Resources on Distributed Hash Tables appeared first on Bozho's tech blog.

Automate Access Control for User-Specific Entities

Post Syndicated from Bozho original https://techblog.bozho.net/automate-access-control-for-user-specific-entities/

Practically every web application is supposed to have multiple users and each user has some data – posts, documents, messages, whatever. And the most obvious thing to do is to protect these entities from being obtained by users that are not the rightful owners of these resources.

Unfortunately, this is not the easiest thing to do. I don’t mean it’s hard, it’s just not as intuitive as simply returning the resources. When you are your /record/{recordId} endpoint, a database query for the recordId is the immediate thing you do. Only then comes the concern of checking whether this record belongs to the currently authenticated user.

Frameworks don’t give you a hand here, because this access control and ownership logic is domain-specific. There’s no obvious generic way to define the ownership. It depends on the entity model and the relationships between entities. In some cases it can be pretty complex, involving a lookup in a join table (for many-to-many relationships).

But you should automate this, for two reasons. First, manually doing these checks on every endpoint/controller method is tedious and makes the code ugly. Second, it’s easier to forget to add these checks, especially if there are new developers.

You can do these checks in several places, all the way to the DAO, but in general you should fail as early as possible, so these checks should be on a controller (endpoint handler) level. In the case of Java and Spring, you can use annotations and a HandlerInterceptor to automate this. In case of any other language or framework, there are similar approaches available – some pluggable way to describe the ownership relationship to be checked.

Below is an example annotation to put on each controller method:

public @interface VerifyEntityOwnership {
    String entityIdParam() default "id";
    Class<?> entityType();
}

Then you define the interceptor (which, of course, should be configured to be executed)

@Component
public class VerifyEntityOwnershipInterceptor extends HandlerInterceptorAdapter {

    private static final Logger logger = LoggerFactory.getLogger(VerifyEntityOwnershipInterceptor.class);
    
    @Autowired
    private OrganizationService organizationService;

    @Autowired
    private MessageService MessageService;
    
    @Autowired
    private UserService userService;
    
    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {

        Authentication authentication = SecurityContextHolder.getContext().getAuthentication();
        // assuming spring-security with a custom authentication token type
        if (authentication instanceof ApiAuthenticationToken) {
            AuthenticationData authenticationData = ((ApiAuthenticationToken) authentication).getAuthenticationData();

            UUID clientId = authenticationData.getClientId();
            HandlerMethod handlerMethod = (HandlerMethod) handler;
            
            VerifyEntityOwnership annotation = handlerMethod.getMethodAnnotation(VerifyEntityOwnership.class);
            if (annotation == null) {
                logger.warn("No VerifyEntityOwnership annotation found on method {}", handlerMethod.getMethod().getName());
                return true;
            }
            
            String entityId = getParam(request, annotation.entityIdParam());
            if (entityId != null) {
                if (annotation.entityType() == User.class) {
                    User user = userService.get(entityId);
                    if (!user.getClientId().equals(clientId)) {
                       return false;
                    }
                } else if (annotation.entityType() == Message.class) {
                    Message record = messageService.get(entityId);
                    if (!message.getClientId().equals(clientId) {
                        return false;
                    }
                } // .... more
            }
        }

        return true;
    }
    
    @SuppressWarnings("unchecked")
    private String getParam(HttpServletRequest request, String paramName) {
        String value = request.getParameter(paramName);
        if (value != null) {
            return value;
        }
        Map<String, String> pathVariables = (Map<String, String>) request.getAttribute(HandlerMapping.URI_TEMPLATE_VARIABLES_ATTRIBUTE);
        return pathVariables.get(paramName);
    }
}

You see that this presumes the need for custom logic per type. If your model is simple, you can make that generic – make all your entities implement some `Owned interface with getClientId() method that all of them define. Then simply have a dao.get(id, entityClass); and avoid having entity-specific logic.

Notice the warning that gets printed when there is no annotation on a method – this is there to indicate that you might have forgotten to add one. Some endpoints may not require ownership check – for them you can have a special @IgnoreEntityOwnership annotation. The point is to make a conscious decision to not verify the ownership, rather than to forget about it and introduce a security issue.

What I’m saying might be obvious. But I’ve seen many examples of this omission, including production government projects. And as I said, frameworks don’t force you to consider that aspect, because they can’t do it in a generic way – web frameworks are usually not concerned with your entity model, and your ORM is not concerned with your controllers. There are comprehensive frameworks that handle all of these aspects, but even they don’t have generic mechanisms for that (at least not that I’m aware of).

Security includes applying a set of good practices and principles to a system. But it also includes procedures and automations that help developers and admins in not omitting something that they are generally aware of, but happen to forget every now and then. And the less tedious a security principle is to apply, the more likely it will be consistently applied.

The post Automate Access Control for User-Specific Entities appeared first on Bozho's tech blog.

Scaling Horizontally on AWS [talk]

Post Syndicated from Bozho original https://techblog.bozho.net/scaling-horizontally-on-aws-talk/

On a recent conference (HackConf) I gave a talk where I tried to summarize how to do deployment and horizontal scaling on AWS. It is an overview of AWS resources (instance, load balancers, auto-scaling groups, security groups) as well as how to use CloudFormation to script your stack.

It briefly mentions the application layer and how it should look like (because another talk on the same conference was focused on that part). My point here is summarized as: ““You cannot scale an unscalable application”. But the talk continues to discuss AWS specific things, although many of them have their nearly identical counterparts in other IaaS providers (e.g. Google Cloud, Azure).

The video of the talk can be seen here:

And the slides are here:

As someone summarized on twitter: “That talk is approximately a year worth of learning experience with AWS in 40 minutes”. This is a benefit and a drawback, as it might be too condensed and too shallow, but I think I’ve covered important bits with enough depth for a starting point.

One of my points was that for simpler setups you don’t need fancy tools and platforms (docker, kubernetes, etc.) – as you’ll have to use bash anyway, you can go with just bash + CloudFormation and have a perfectly good, highly-available, blue-green deployment setup.

The other main points where: “think about your infrastructure as code”, and “consider all your resources dispensable, as they will surely die at some point”. Overall, I hope the talk is useful for everyone using or planning to use AWS, or any other IaaS provider.

The post Scaling Horizontally on AWS [talk] appeared first on Bozho's tech blog.

Typical Workarounds For Compliant Logs

Post Syndicated from Bozho original https://techblog.bozho.net/typical-workarounds-for-compliant-logs/

You may think you have logs. Chances are, you can rely on them only for tracing exceptions and debugging. But you can’t rely on them for compliance, forensics, or any legal matter. And that may be none of your concern as an engineer, but it is one of those important non-functional requirements that, if not met, are ultimately our fault.

Because of my audit trail startup, I’m obviously both biased and qualified to discuss logs (I’ve previously described what audit trail / audit logs are and how they can be maintained). And while they are a only a part of the security aspects, and certainly not very exciting, they are important, especially from a legal standpoint. That’s why many standards and laws – including ISO 27001, PCI-DSS, HIPAA, SOC 2, PSD2, GDPR – require audit trail functionality. And most of them have very specific security requirements.

PCI-DSS has a bunch of sections with audit trail related requirements:

10.2.3
“Malicious users often attempt to alter audit logs to hide their actions, and a record of access allows an organization to trace any inconsistencies or potential tampering of the logs to an individual account. [..]”

10.5 Secure audit trails so they cannot be altered. Often a malicious individual who has entered the network will attempt to edit the audit logs in order to hide their activity. Without adequate protection of audit logs, their completeness, accuracy, and integrity cannot be guaranteed, and the audit logs can be rendered useless as an investigation tool after a compromise.

10.5.5
Use file-integrity monitoring or change-detection software on logs to ensure that existing log data cannot be changed without generating alerts (although new data being added should not cause an alert).

ISO 27001 Annex A also speaks about protecting the audit trail against tampering

A.12.4.2 Protection of log information
Logging facilities and log information shall be protected against tampering and unauthorized access

From my experience, sadly, logs are rarely fully compliant. However, auditors are mostly fine with that and certification is given, even though logs can be tampered with. I decided to collect and list the typical workarounds the the secure, tamper-evident/tamper-protected audit logs.

  • We don’t need them secured – you almost certainly do. If you need to be compliant – you do. If you don’t need to be compliant, but you have a high-value / high-impact system, you do. If it doesn’t have to be compliant and it’s low-value or low-impact, than yes. You don’t need much security for that anyway (but be careful to not underestimate the needed security)
  • We store it in multiple places with different access – this is based on the assumption that multiple administrators won’t conspire to tamper with the logs. And you can’t guarantee that, of course. But even if you are sure they won’t, you can’t prove that to external parties. Imagine someone sues you and you provide logs as evidence. If the logs are not tamper-evident, the other side can easily claim you have fabricated logs and make them inadmissible evidence.
  • Our system is so complicated, nobody knows how to modify data without breaking our integrity checks – this is the “security through obscurity” approach and it will probably work well … until it doesn’t.
  • We store it with external provider – external log providers usually claim they provide compliance. And they do provide many aspects of compliance, mainly operational security around the log collection systems. Besides, in that case you (or your admins) can’t easily modify the externally stored records. Some providers may give you the option to delete records, which isn’t great for audit trail. The problem with such services is that they keep the logs operational for relatively short periods of time and then export them in a form that can be tampered with. Furthermore, you can’t really be sure that the logs are not tampered with. Yes, the provider is unlikely to care about your logs, but having that as the main guarantee doesn’t sound perfect.
  • We are using trusted timestamping – and that’s great, it covers many aspects of the logs integrity. AWS is timestamping their CloudTrail log files and it’s certainly a good practice. However, it comes short in just one scenario – someone deleting an entire timestamped file. And because it’s a whole file, rather than record-by-record, you won’t know which record exactly was targeted. There’s another caveat – if the TSA is under your control, you can backdate timestamps and therefore can’t prove that you didn’t fabricate logs.

These approaches are valid and are somewhere on a non-zero point on the compliance spectrum. Is having four copies of the data accessible to different admins better than having just one? Yup. Is timestamping with a local TSA better than not timestamping? Sure. Is using an external service more secure than using a local one? Yes. Are they sufficient to get certified? Apparently yes. Is this the best that can be done? No. Do you need the best? Sometimes yes.

Most of these measures are organizational rather than technical, and organizational measures can be reversed or circumvented much more easily than technical ones. And I may be too paranoid, but when I was a government advisor, I had to be paranoid when it comes to security.

And what do I consider a non-workaround? There is a lot of research around tamper-evident logging, tamper-evident data structures, merkle trees, hash chains. Here’s just one example. It should have been mainstream by now, but it isn’t. We’ve apparently settled on suboptimal procedures (even when it comes to cost), and we’ve interpreted the standards loosely, looking for a low-hanging fruit. And that’s fine for some organizations. I guess.

It takes time for a certain good practice to become mainstream, and it has to be obvious. Tamper-evident logging isn’t obvious. We had gradually become more aware about properly storing passwords, using TLS, multi-factor authentication. But security is rarely a business priority, as seen in reports about what drives security investments (it’s predominantly “compliance”).

As a practical conclusion – if you are going to settle for a workaround, at least choose a better one. Not having audit trail or not making any effort to protect it from tampering should be out of the question.

The post Typical Workarounds For Compliant Logs appeared first on Bozho's tech blog.

A Caveat With AWS Shared Resources

Post Syndicated from Bozho original https://techblog.bozho.net/a-caveat-with-aws-shared-resources/

Recently I’ve been releasing a new build, as usual utilizing a blue-green deployment by switching the DNS record to point to the load balancer of the previously “spare” group. But before I switched the DNS, I checked the logs of the newly launched version and noticed something strange – continuous HTTP errors from our web frameworks (Spring MVC) that a certain endpoint does not support the HTTP method.

The odd thing was – I didn’t have such an endpoint at all. I enabled further logging and it turned out that the request URL was not about my domain at all. The spare group, not yet having traffic directed at it, was receiving requests pointed at a completely different domain, which I didn’t own.

I messaged the domain owner, as well as AWS, to inform them of the issue. The domain owner said they have no idea what that is and that they don’t have any unused or forgotten AWS resources. AWS, however, responded as follows:

The ELB service scales dynamically as traffic demand changes, therefore when scaling occurs, the ELB service will take IP addresses from the AWS unused public IP address pool and assign them to the ELB nodes that are provisioned for you. The foreign domain name you see here in your case, likely belongs to another AWS customer who’s AWS resource is no longer using one of the IP addresses that your ELB node now has as it was released to the AWS unused IP pool at some stage, their web clients are very likely excessively caching DNS for these DNS names (not respecting DNS TTL), or their own DNS servers are configured with static entries and are therefore communicating with an IP address that now belongs to your ELB. The ELB adding and removing IPs from Route53 is briefly described in [Link 1] and the TTL attached to the DNS name is 60 seconds. Provided that clients respect the TTL, there should be no such issues.

I can simply ignore the traffic, but what happens if I’m in this role – after a burst my IP gets released, but some client (or some intermediate DNS resolver) has cached the information for longer than instructed. Then requests to my service, including passwords, API keys, etc. will be forwarded to someone else.

Using HTTPS might help in case of browsers, as the the certificate of the new load balancer will not match my domain, but in case of other tools that don’t perform this validation or have it cached, HTTPS won’t help, unless there’s certificate pinning implemented.

AWS say they can’t fix that at the load balancer, but they actually can, by keeping a mapping between IPs, owners and Host headers. It won’t be trivial, but it’s worth exploring in case my experience is not an exceptional scenario. Whether it’s worth fixing if HTTPS solves it – probably not.

So this is yet another reason to always use HTTPS and to force HTTPS if connection is made over HTTP. But also a reminder to not do clever client-side IP caching (let the DNS resolvers handle that) and to always verify the server certificate.

The post A Caveat With AWS Shared Resources appeared first on Bozho's tech blog.

Writing Big JSON Files With Jackson

Post Syndicated from Bozho original https://techblog.bozho.net/writing-big-json-files-with-jackson/

Sometimes you need to export a lot of data to JSON to a file. Maybe it’s “export all data to JSON”, or the GDPR “Right to portability”, where you effectively need to do the same.

And as with any big dataset, you can’t just fit it all in memory and write it to a file. It takes a while, it reads a lot of entries from the database and you need to be careful not to make such exports overload the entire system, or run out of memory.

Luckily, it’s fairly straightforward to do that, with a the help Jackson’s SequenceWriter and optionally of piped streams. Here’s how it would look like:

    private ObjectMapper jsonMapper = new ObjectMapper();
    private ExecutorService executorService = Executors.newFixedThreadPool(5);

    @Async
    public ListenableFuture<Boolean> export(UUID customerId) {
        try (PipedInputStream in = new PipedInputStream();
                PipedOutputStream pipedOut = new PipedOutputStream(in);
                GZIPOutputStream out = new GZIPOutputStream(pipedOut)) {
        
            Stopwatch stopwatch = Stopwatch.createStarted();

            ObjectWriter writer = jsonMapper.writer().withDefaultPrettyPrinter();

            try(SequenceWriter sequenceWriter = writer.writeValues(out)) {
                sequenceWriter.init(true);
            
                Future<?> storageFuture = executorService.submit(() ->
                       storageProvider.storeFile(getFilePath(customerId), in));

                int batchCounter = 0;
                while (true) {
                    List<Record> batch = readDatabaseBatch(batchCounter++);
                    for (Record record : batch) {
                        sequenceWriter.write(entry);
                    }
                }

                // wait for storing to complete
                storageFuture.get();
            }  

            logger.info("Exporting took {} seconds", stopwatch.stop().elapsed(TimeUnit.SECONDS));

            return AsyncResult.forValue(true);
        } catch (Exception ex) {
            logger.error("Failed to export data", ex);
            return AsyncResult.forValue(false);
        }
    }

The code does a few things:

  • Uses a SequenceWriter to continuously write records. It is initialized with an OutputStream, to which everything is written. This could be a simple FileOutputStream, or a piped stream as discussed below. Note that the naming here is a bit misleading – writeValues(out) sounds like you are instructing the writer to write something now; instead it configures it to use the particular stream later.
  • The SequenceWriter is initialized with true, which means “wrap in array”. You are writing many identical records, so they should represent an array in the final JSON.
  • Uses PipedOutputStream and PipedInputStream to link the SequenceWriter to a an InputStream which is then passed to a storage service. If we were explicitly working with files, there would be no need for that – simply passing a FileOutputStream would do. However, you may want to store the file differently, e.g. in Amazon S3, and there the putObject call requires an InputStream from which to read data and store it in S3. So, in effect, you are writing to an OutputStream which is directly written to an InputStream, which, when attampted to be read from, gets everything written to another OutputStream
  • Storing the file is invoked in a separate thread, so that writing to the file does not block the current thread, whose purpose is to read from the database. Again, this would not be needed if simple FileOutputStream was used.
  • The whole method is marked as @Async (spring) so that it doesn’t block execution – it gets invoked and finishes when ready (using an internal Spring executor service with a limited thread pool)
  • The database batch reading code is not shown here, as it varies depending on the database. The point is, you should fetch your data in batches, rather than SELECT * FROM X.
  • The OutputStream is wrapped in a GZIPOutputStream, as text files like JSON with repetitive elements benefit significantly from compression

The main work is done by Jackson’s SequenceWriter, and the (kind of obvious) point to take home is – don’t assume your data will fit in memory. It almost never does, so do everything in batches and incremental writes.

The post Writing Big JSON Files With Jackson appeared first on Bozho's tech blog.

Implementing White-Labelling

Post Syndicated from Bozho original https://techblog.bozho.net/implementing-white-labelling/

Sometimes (very often in my experience) you need to support white-labelling of your application. You may normally run it in a SaaS fashion, but some important or high profile clients may want either a dedicated deployment, or an on-premise deployment, or simply “their corner” on your cloud deployment.

White-labelling normally includes different CSS, different logos and other images, and different header and footer texts. The rest of the product stays the same. So how do we support white-labelling in the least invasive way possible? (I will use Spring MVC in my examples, but it’s pretty straightforward to port the logic to other frameworks)

First, let’s outline the three different ways white-labelling can be supported. You can (and probably should) implement all of them, as they are useful in different scenarios, and have much overlap.

  • White-labelled installation – change the styles of the whole deployment. Useful for on-premise or managed installations.
  • White-labelled subdomain – allow different styling of the service is accessed through a particular subdomain
  • White-labelled client(s) – allow specific customers, after logging in, to see the customized styles

To implement a full white-labelled installation, we have to configure a path on the filesystem where the customized css files and images will be placed, as well as the customized texts. Here’s an example from a .properties file passed to the application on startup:

styling.dir=/var/config/whitelabel
styling.footer=&copy;2018 Your Company
styling.logo=/images/logsentinel-logo.png
styling.css=/css/custom.css
styling.title=Your Company

In spring / spring boot, you can server files from the file system if a certain URL pattern is matched. For example:

@Component
@Configuration
public class WebMvcCustomization implements WebMvcConfigurer {
  @Value("${styling.dir}")
  private String whiteLabelDir;

  @Override
  public void addResourceHandlers(ResourceHandlerRegistry registry) {
    registry.addResourceHandler("/whitelabel/**").addResourceLocations(whiteLabelDir);
  }
}

And finally, you need to customize your HTML templates, but we’ll get to that at the end, when all the other options are implemented as well.

Next, are white-labelled subdomain. For me this is the best option, as it allows you to have a single installation with multiple customers with specific styles. The style depends solely on the domain/subdomain the service is accessed through.

For that, we’d need to introduce an entity, WhitelabelStyling and a corresponding database table. We can make some admin UI to configure that, or configure it directly in the database. The entity may look something like this:

@Table("whitelabel_styling")
public class WhitelabelStyling {
    @PrimaryKey
    private String key;
    @Column
    private String title;
    @Column
    private String css;
    @Column
    @CassandraType(type = DataType.Name.BLOB)
    private byte[] logo;
    @Column
    private String footer;
    @Column
    private String domain;

   // getters and setters
}

The key is an arbitrary string you choose. It may be the same as the (sub)domain or some other business-meaningful string. The rest is mostly obvious. After we have this, we need to be able to serve the resources. For that we need a controller, which you can see here. The controller picks up a white-label key and tries to load the corresponding entry from the database, and then serves the result. The controller endpoints are in this case /whitelabel-resources/logo.png and /whitelabel-resources/style.css.

In order to set the proper key for the particular subdomain, you need a per-request model attribute (i.e. a value that is set in the model of all pages being rendered). Something like this (which refreshes the white-label cache once a day; the cache is mandatory if you don’t want to hit the database on every request):

@ModelAttribute("domainWhitelabel")
public WhitelabelStyling perDomainStyling(HttpServletRequest request) {
    String serverName = request.getServerName();
    if (perDomainStylings.containsKey(serverName)) {
        return perDomainStylings.get(serverName);
    }
    return null;
}

@Scheduled(fixedRate = DateTimeConstants.MILLIS_PER_DAY)
public void refreshAllowedWhitelabelDomains() {
     perDomainStylings = whitelabelService.getWhitelabelStyles()
            .stream()
            .collect(Collectors.toMap(WhitelabelStyling::getDomain, Function.identity()));
}

And finally, per-customer white-labeling is achieved the same way as above, using the same controller, only the current key is not fetched based on request.getServerName() but on a property of the currently authenticated user. An admin (through a UI or directly in the database) can assign a whitelabel key to each user, and then, after login, that user sees the customized styling.

We’ve seen how the Java part of the solution looks, but we need to modify the HTML templates in order to pick the customisations. A simple approach would look like this (using pebble templating):

{% if domainWhitelabel != null %}
  <link href="/whitelabel-resources/style.css?key={{ domainWhitelabel.key }}" rel="stylesheet">
{% elseif user.whitelabelStyling != null and user.whitelabelStyling.css != '' %}
  <link href="/whitelabel-resources/style.css" rel="stylesheet">
{% elseif beans.environment.getProperty('styling.dir') != '' and beans.environment.getProperty('styling.css.enabled') == true %}
  <link href="{{'/whitelabel/'+  beans.environment.getProperty('styling.css')}}" rel="stylesheet">
{% else %}
  <link href="{{ beans.environment.getProperty('styling.css')}}" rel="stylesheet">
{% endif %}

It’s pretty straightforward – if there’s a domain-level white-labelling configured, use that; if not, check if the current user has specific white-label assigned; if not, check if global installation white-labelling is configured; if not, use the default. This snippet makes use of the WhitelabelController above (in the former two cases) and of the custom resource handler in the penultimate case.

Overall, this is a flexible and easy solution that shouldn’t take more than a few days to implement and test even on existing systems. I’ll once again voice my preference for the domain-based styles, as they allow having the same multi-tenant installation used with many different styles and logos. Of course, your web server/load balancer/domain should be configured properly to allow subdomains and let you easily manage them, but that’s offtopic.

I think white-labelling is a good approach for many products. Obviously, don’t implement it until the business needs it, but have in mind that it might come down the line and that’s relatively easy to implement.

The post Implementing White-Labelling appeared first on Bozho's tech blog.

5 Features Eclipse Should Copy From IntelliJ IDEA

Post Syndicated from Bozho original https://techblog.bozho.net/5-features-eclipse-should-copy-from-intellij-idea/

Eclipse Photon has been released a few days ago, and I decided to do yet another comparison with IntelliJ IDEA. Last time I explained why I still prefer Eclipse, but because my current project had problems with Java 9 in Eclipse initially, I’ve been using IntelliJ IDEA in the past half a year. (Still using Eclipse for everything else; partly because of the lack of “multiple projects in one workspace” in IDEA).

This time, though, the comparison will be the other way around – what IDEA features I’d really like to have in Eclipse; features that make work much easier and way more efficient. (Btw, what’s the proper short version to use – IntelliJ? IDEA?)

Isn’t that a departure from my stance “Eclipse is better”? No – I don’t believe there’s a perfect IDE (or perfect anything, for that matter), so any product can try to get the best aspects of the competition. Here I’ll focus on five features of IDEA where Eclipse lags behind.

First, the “Find in path” dialog. The interactivity of the dialog, the fact that you see all the results while typing and being able to navigate the results with the arrows is huge. Compare that to Eclipse’s clunky Search dialog, which (while pretty powerful), has a million tabs (rarely focused on the one you need) and then you actually click “Search” to get a list of results in a search panel, where you double-click in order to see the context…it’s just bad compared to IDEA.

Second is suggesting static imports. Static imports are not used too often, except in tests. Mockito, Hamcrest, test utility methods – in every class you need dozens of static imports. And Eclipse feels miserable with those – you manually go and import the methods you need, then organize imports and suddenly you need another one, and the .* you naively added has been changed to particular imports, so once again, you have to go and manually import. In contrast, IDEA just suggest the most relevant static import in the autocomplete pop-up and handles that for you.

Third is autocomplete. IDEA autocomplete triggers automatically when you start typing; in Eclipse it only triggers after a dot – otherwise you have to CTRL+space. And yes, I know there’s auto-activation setting where you can configure symbols that trigger the auto-complete, but as I’ve previously complained about IDEA’s defaults, it’s Eclipse’s turn. And it’s not even a checkbox – you have to actively type the entire alphabet, lower and upper case, in order to get it working – that’s just bad design. In what scenario would I need autocomplete on a,b,c but not on d,e,f??

Fourth is lambda simplification. You sometimes end up with pretty long chain of calls on a stream and they may not be the best way to express what you want. IDEA can suggest improvements so that it is more readable and easier to understand while achieving the same result. As a bonus, you eventually start doing this simplifications yourself.

Fifth – parameter labels. When you call a method foo.bar("Some string", 0, true) it’s not exactly obvious what the parameters are. And while you can rightly argue that this is a bad method signature, primitive (+String) parameters where you just pass a value happen every now and then, and it’s useful to see the name of the parameter at the point of method invocation. IDEA nicely shows that.

There are certainly more things that each of the IDEs can copy from the other one. Hopefully this competition will continue and result in improving both.

The post 5 Features Eclipse Should Copy From IntelliJ IDEA appeared first on Bozho's tech blog.

Electronic Signatures Using The Browser

Post Syndicated from Bozho original https://techblog.bozho.net/electronic-signatures-using-the-browser/

Sometimes, especially in government or enterprise context, you need to sign a document in the browser using a smartcard (some may call it “crypto token”). It’s rare, but many people have asked me, in private messages and emails, how to do it. Maybe they’ve seen some of my articles from several years ago, but failed to make it work. And my articles show the evolution (or devolution) of in-browser electronic signing.

First it was possible with javascript, then I even created a library to make things easier. Then CAPICOM and window.crypto were deprecated, so the only option was to use a Java applet. Then Java applets were deprecated and we were out of options. We got the web crytpo API, but it explicitly didn’t support hardware tokens.

For that reason, I wrote a “plea” for smartcard support in browsers, but it hasn’t happened yet and probably won’t in the near future. So what can we do now, that all previous options are deprecated?

A good approach is to have a one-time installation of some custom software (it could be a Java Web Start application or a Java-independent application), which runs a local service that listens to a particular port, and then a javascript library that sends the data to be signed to http://localhost:1234/sign and gets the response. There are such solutions available, notably NexU (thanks to efforts put in the DSS package). There are other attempts, such as this one, using Java Web Start (it’s currently not in English).

You can try NexU’s demo here. It’s also included in the dss-demo-webapp project.

It has some tricky bits that have been recently resolved in browsers, namely, that in order to send an XMLHTTPRequest to the local service, it has to run on HTTPS, and therefor you have to package a private key in your applications (which goes against the requirements of many Certificate Authorities). Now, as far as I know, localhost is exempt from that requirement.

I hope I don’t have to write yet another article in two years explaining that this approach is superseded by yet another hacky approach.

The post Electronic Signatures Using The Browser appeared first on Bozho's tech blog.

Storing Encrypted Credentials In Git

Post Syndicated from Bozho original https://techblog.bozho.net/storing-encrypted-credentials-in-git/

We all know that we should not commit any passwords or keys to the repo with our code (no matter if public or private). Yet, thousands of production passwords can be found on GitHub (and probably thousands more in internal company repositories). Some have tried to fix that by removing the passwords (once they learned it’s not a good idea to store them publicly), but passwords have remained in the git history.

Knowing what not to do is the first and very important step. But how do we store production credentials. Database credentials, system secrets (e.g. for HMACs), access keys for 3rd party services like payment providers or social networks. There doesn’t seem to be an agreed upon solution.

I’ve previously argued with the 12-factor app recommendation to use environment variables – if you have a few that might be okay, but when the number of variables grow (as in any real application), it becomes impractical. And you can set environment variables via a bash script, but you’d have to store it somewhere. And in fact, even separate environment variables should be stored somewhere.

This somewhere could be a local directory (risky), a shared storage, e.g. FTP or S3 bucket with limited access, or a separate git repository. I think I prefer the git repository as it allows versioning (Note: S3 also does, but is provider-specific). So you can store all your environment-specific properties files with all their credentials and environment-specific configurations in a git repo with limited access (only Ops people). And that’s not bad, as long as it’s not the same repo as the source code.

Such a repo would look like this:

project
└─── production
|   |   application.properites
|   |   keystore.jks
└─── staging
|   |   application.properites
|   |   keystore.jks
└─── on-premise-client1
|   |   application.properites
|   |   keystore.jks
└─── on-premise-client2
|   |   application.properites
|   |   keystore.jks

Since many companies are using GitHub or BitBucket for their repositories, storing production credentials on a public provider may still be risky. That’s why it’s a good idea to encrypt the files in the repository. A good way to do it is via git-crypt. It is “transparent” encryption because it supports diff and encryption and decryption on the fly. Once you set it up, you continue working with the repo as if it’s not encrypted. There’s even a fork that works on Windows.

You simply run git-crypt init (after you’ve put the git-crypt binary on your OS Path), which generates a key. Then you specify your .gitattributes, e.g. like that:

secretfile filter=git-crypt diff=git-crypt
*.key filter=git-crypt diff=git-crypt
*.properties filter=git-crypt diff=git-crypt
*.jks filter=git-crypt diff=git-crypt

And you’re done. Well, almost. If this is a fresh repo, everything is good. If it is an existing repo, you’d have to clean up your history which contains the unencrypted files. Following these steps will get you there, with one addition – before calling git commit, you should call git-crypt status -f so that the existing files are actually encrypted.

You’re almost done. We should somehow share and backup the keys. For the sharing part, it’s not a big issue to have a team of 2-3 Ops people share the same key, but you could also use the GPG option of git-crypt (as documented in the README). What’s left is to backup your secret key (that’s generated in the .git/git-crypt directory). You can store it (password-protected) in some other storage, be it a company shared folder, Dropbox/Google Drive, or even your email. Just make sure your computer is not the only place where it’s present and that it’s protected. I don’t think key rotation is necessary, but you can devise some rotation procedure.

git-crypt authors claim to shine when it comes to encrypting just a few files in an otherwise public repo. And recommend looking at git-remote-gcrypt. But as often there are non-sensitive parts of environment-specific configurations, you may not want to encrypt everything. And I think it’s perfectly fine to use git-crypt even in a separate repo scenario. And even though encryption is an okay approach to protect credentials in your source code repo, it’s still not necessarily a good idea to have the environment configurations in the same repo. Especially given that different people/teams manage these credentials. Even in small companies, maybe not all members have production access.

The outstanding questions in this case is – how do you sync the properties with code changes. Sometimes the code adds new properties that should be reflected in the environment configurations. There are two scenarios here – first, properties that could vary across environments, but can have default values (e.g. scheduled job periods), and second, properties that require explicit configuration (e.g. database credentials). The former can have the default values bundled in the code repo and therefore in the release artifact, allowing external files to override them. The latter should be announced to the people who do the deployment so that they can set the proper values.

The whole process of having versioned environment-speific configurations is actually quite simple and logical, even with the encryption added to the picture. And I think it’s a good security practice we should try to follow.

The post Storing Encrypted Credentials In Git appeared first on Bozho's tech blog.

Audit Trail Overview

Post Syndicated from Bozho original https://techblog.bozho.net/audit-trail-overview/

As part of my current project (secure audit trail) I decided to make a survey about the use of audit trail “in the wild”.

I haven’t written in details about this project of mine (unlike with some other projects). Mostly because it’s commercial and I don’t want to use my blog as a direct promotion channel (though I am doing that at the moment, ironically). But the aim of this post is to shed some light on how audit trail is used.

The survey can be found here. The questions are basically: does your current project have audit trail functionality, and if yes, is it protected from tampering. If not – do you think you should have such functionality.

The results are interesting (although with only around 50 respondents)

So more than half of the systems (on which respondents are working) don’t have audit trail. While audit trail is recommended by information security and related standards, it may not find place in the “busy schedule” of a software project, even though it’s fairly easy to provide a trivial implementation (e.g. I’ve written how to quickly setup one with Hibernate and Spring)

A trivial implementation might do in many cases but if the audit log is critical (e.g. access to sensitive data, performing financial operations etc.), then relying on a trivial implementation might not be enough. In other words – if the sysadmin can access the database and delete or modify the audit trail, then it doesn’t serve much purpose. Hence the next question – how is the audit trail protected from tampering:

And apparently, from the less than 50% of projects with audit trail, around 50% don’t have technical guarantees that the audit trail can’t be tampered with. My guess is it’s more, because people have different understanding of what technical measures are sufficient. E.g. someone may think that digitally signing your log files (or log records) is sufficient, but in fact it isn’t, as whole files (or records) can be deleted (or fully replaced) without a way to detect that. Timestamping can help (and a good audit trail solution should have that), but it doesn’t guarantee the order of events or prevent a malicious actor from deleting or inserting fake ones. And if timestamping is done on a log file level, then any not-yet-timestamped log file is vulnerable to manipulation.

I’ve written about event logs before and their two flavours – event sourcing and audit trail. An event log can effectively be considered audit trail, but you’d need additional security to avoid the problems mentioned above.

So, let’s see what would various levels of security and usefulness of audit logs look like. There are many papers on the topic (e.g. this and this), and they often go into the intricate details of how logging should be implemented. I’ll try to give an overview of the approaches:

  • Regular logs – rely on regular INFO log statements in the production logs to look for hints of what has happened. This may be okay, but is harder to look for evidence (as there is non-auditable data in those log files as well), and it’s not very secure – usually logs are collected (e.g. with graylog) and whoever has access to the log collector’s database (or search engine in the case of Graylog), can manipulate the data and not be caught
  • Designated audit trail – whether it’s stored in the database or in logs files. It has the proper business-event level granularity, but again doesn’t prevent or detect tampering. With lower risk systems that may is perfectly okay.
  • Timestamped logs – whether it’s log files or (harder to implement) database records. Timestamping is good, but if it’s not an external service, a malicious actor can get access to the local timestamping service and issue fake timestamps to either re-timestamp tampered files. Even if the timestamping is not compromised, whole entries can be deleted. The fact that they are missing can sometimes be deduced based on other factors (e.g. hour of rotation), but regularly verifying that is extra effort and may not always be feasible.
  • Hash chaining – each entry (or sequence of log files) could be chained (just as blockchain transactions) – the next one having the hash of the previous one. This is a good solution (whether it’s local, external or 3rd party), but it has the risk of someone modifying or deleting a record, getting your entire chain and re-hashing it. All the checks will pass, but the data will not be correct
  • Hash chaining with anchoring – the head of the chain (the hash of the last entry/block) could be “anchored” to an external service that is outside the capabilities of a malicious actor. Ideally, a public blockchain, alternatively – paper, a public service (twitter), email, etc. That way a malicious actor can’t just rehash the whole chain, because any check against the external service would fail.
  • WORM storage (write once, ready many). You could send your audit logs almost directly to WORM storage, where it’s impossible to replace data. However, that is not ideal, as WORM storage can be slow and expensive. For example AWS Glacier has rather big retrieval times and searching through recent data makes it impractical. It’s actually cheaper than S3, for example, and you can have expiration policies. But having to support your own WORM storage is expensive. It is a good idea to eventually send the logs to WORM storage, but “fresh” audit trail should probably not be “archived” so that it’s searchable and some actionable insight can be gained from it.
  • All-in-one – applying all of the above “just in case” may be unnecessary for every project out there, but that’s what I decided to do at LogSentinel. Business-event granularity with timestamping, hash chaining, anchoring, and eventually putting to WORM storage – I think that provides both security guarantees and flexibility.

I hope the overview is useful and the results from the survey shed some light on how this aspect of information security is underestimated.

The post Audit Trail Overview appeared first on Bozho's tech blog.

User Authentication Best Practices Checklist

Post Syndicated from Bozho original https://techblog.bozho.net/user-authentication-best-practices-checklist/

User authentication is the functionality that every web application shared. We should have perfected that a long time ago, having implemented it so many times. And yet there are so many mistakes made all the time.

Part of the reason for that is that the list of things that can go wrong is long. You can store passwords incorrectly, you can have a vulnerably password reset functionality, you can expose your session to a CSRF attack, your session can be hijacked, etc. So I’ll try to compile a list of best practices regarding user authentication. OWASP top 10 is always something you should read, every year. But that might not be enough.

So, let’s start. I’ll try to be concise, but I’ll include as much of the related pitfalls as I can cover – e.g. what could go wrong with the user session after they login:

  • Store passwords with bcrypt/scrypt/PBKDF2. No MD5 or SHA, as they are not good for password storing. Long salt (per user) is mandatory (the aforementioned algorithms have it built in). If you don’t and someone gets hold of your database, they’ll be able to extract the passwords of all your users. And then try these passwords on other websites.
  • Use HTTPS. Period. (Otherwise user credentials can leak through unprotected networks). Force HTTPS if user opens a plain-text version.
  • Mark cookies as secure. Makes cookie theft harder.
  • Use CSRF protection (e.g. CSRF one-time tokens that are verified with each request). Frameworks have such functionality built-in.
  • Disallow framing (X-Frame-Options: DENY). Otherwise your website may be included in another website in a hidden iframe and “abused” through javascript.
  • Have a same-origin policy
  • Logout – let your users logout by deleting all cookies and invalidating the session. This makes usage of shared computers safer (yes, users should ideally use private browsing sessions, but not all of them are that savvy)
  • Session expiry – don’t have forever-lasting sessions. If the user closes your website, their session should expire after a while. “A while” may still be a big number depending on the service provided. For ajax-heavy website you can have regular ajax-polling that keeps the session alive while the page stays open.
  • Remember me – implementing “remember me” (on this machine) functionality is actually hard due to the risks of a stolen persistent cookie. Spring-security uses this approach, which I think should be followed if you wish to implement more persistent logins.
  • Forgotten password flow – the forgotten password flow should rely on sending a one-time (or expiring) link to the user and asking for a new password when it’s opened. 0Auth explain it in this post and Postmark gives some best pracitces. How the link is formed is a separate discussion and there are several approaches. Store a password-reset token in the user profile table and then send it as parameter in the link. Or do not store anything in the database, but send a few params: userId:expiresTimestamp:hmac(userId+expiresTimestamp). That way you have expiring links (rather than one-time links). The HMAC relies on a secret key, so the links can’t be spoofed. It seems there’s no consensus, as the OWASP guide has a bit different approach
  • One-time login links – this is an option used by Slack, which sends one-time login links instead of asking users for passwords. It relies on the fact that your email is well guarded and you have access to it all the time. If your service is not accessed to often, you can have that approach instead of (rather than in addition to) passwords.
  • Limit login attempts – brute-force through a web UI should not be possible; therefore you should block login attempts if they become too many. One approach is to just block them based on IP. The other one is to block them based on account attempted. (Spring example here). Which one is better – I don’t know. Both can actually be combined. Instead of fully blocking the attempts, you may add a captcha after, say, the 5th attempt. But don’t add the captcha for the first attempt – it is bad user experience.
  • Don’t leak information through error messages – you shouldn’t allow attackers to figure out if an email is registered or not. If an email is not found, upon login report just “Incorrect credentials”. On passwords reset, it may be something like “If your email is registered, you should have received a password reset email”. This is often at odds with usability – people don’t often remember the email they used to register, and the ability to check a number of them before getting in might be important. So this rule is not absolute, though it’s desirable, especially for more critical systems.
  • Make sure you use JWT only if it’s really necessary and be careful of the pitfalls.
  • Consider using a 3rd party authentication – OpenID Connect, OAuth by Google/Facebook/Twitter (but be careful with OAuth flaws as well). There’s an associated risk with relying on a 3rd party identity provider, and you still have to manage cookies, logout, etc., but some of the authentication aspects are simplified.
  • For high-risk or sensitive applications use 2-factor authentication. There’s a caveat with Google Authenticator though – if you lose your phone, you lose your accounts (unless there’s a manual process to restore it). That’s why Authy seems like a good solution for storing 2FA keys.

I’m sure I’m missing something. And you see it’s complicated. Sadly we’re still at the point where the most common functionality – authenticating users – is so tricky and cumbersome, that you almost always get at least some of it wrong.

The post User Authentication Best Practices Checklist appeared first on Bozho's tech blog.

Setting Up Cassandra With Priam

Post Syndicated from Bozho original https://techblog.bozho.net/setting-cassandra-priam/

I’ve previously explained how to setup Cassandra in AWS. The described setup works, but in some cases it may not be sufficient. E.g. it doesn’t give you an easy way to make and restore backups, and adding new nodes relies on a custom python script that randomly selects a seed.

So now I’m going to explain how to setup Priam, a Cassandra helper tool by Netflix.

My main reason for setting it up is the backup/restore functionality that it offers. All other ways to do backups are very tedious, and Priam happens to have implemented the important bits – the snapshotting and the incremental backups.

Priam is a bit tricky to get running, though. The setup guide is not too detailed and not easy to find (it’s the last, not immediately visible item in the wiki). First, it has one branch per Cassandra version, so you have to checkout the proper branch and build it. I immediately hit an issue there, as their naming doesn’t allow eclipse to import the gradle project. Within 24 hours I reported 3 issues, which isn’t ideal. Priam doesn’t support dynamic SimpleDB names, and doesn’t let you override bundled properties via the command line. I hope there aren’t bigger issues. The ones that I encountered, I fixed and made a pull request.

What does the setup look like?

  • Append a javaagent to the JVM options
  • Run the Priam web
  • It automatically replaces most of cassandra.yaml, including the seed provider (i.e. how does the node find other nodes in the cluster)
  • Run Cassandra
  • It fetches seed information (which is stored in AWS SimpleDB) and connects to a cluster

I decided to run the war file with a standalone jetty runner, rather than installing tomcat. In terms of shell scripts, the core bits look like that (in addition to the shell script in the original post that is run on initialization of the node):

# Get the Priam war file and jar file
aws s3 cp s3://$BUCKET_NAME/priam-web-3.12.0-SNAPSHOT.war ~/
aws s3 cp s3://$BUCKET_NAME/priam-cass-extensions-3.12.0-SNAPSHOT.jar /usr/share/cassandra/lib/priam-cass-extensions.jar
# Set the Priam agent
echo "-javaagent:/usr/share/cassandra/lib/priam-cass-extensions.jar" >> /etc/cassandra/conf/jvm.options

# Download jetty-runner to be able to run the Priam war file from the command line
wget http://central.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.8.v20171121/jetty-runner-9.4.8.v20171121.jar
nohup java -Dpriam.clustername=LogSentinelCluster -Dpriam.sdb.instanceIdentity.region=$EC2_REGION -Dpriam.s3.bucket=$BACKUP_BUCKET \
-Dpriam.sdb.instanceidentity.domain=$INSTANCE_IDENTITY_DOMAIN -Dpriam.sdb.properties.domain=$PROPERTIES_DOMAIN \
-Dpriam.client.sslEnabled=true -Dpriam.internodeEncryption=all -Dpriam.rpc.server.type=sync \
-Dpriam.partitioner=org.apache.cassandra.dht.Murmur3Partitioner -Dpriam.backup.retention.days=7 \
-Dpriam.backup.hour=$BACKUP_HOUR -Dpriam.vnodes.numTokens=256 -Dpriam.thrift.enabled=false \
-jar jetty-runner-9.4.8.v20171121.jar --path /Priam ~/priam-web-3.12.0-SNAPSHOT.war &

while ! echo exit | nc $BIND_IP 8080; do sleep 10; done

echo "Started Priam web package"

service cassandra start
chkconfig cassandra on

while ! echo exit | nc $BIND_IP 9042; do sleep 10; done

BACKUP_BUCKET, PROPERTIES_DOMAIN and INSTANCE_DOMAIN are supplied via a CloudFormation script (as we can’t know the exact names in advance – especially for SimpleDB). Note that these properties won’t work in the main repo – I added them in my pull request.

In order for that to work, you need to have the two SimpleDB domains created (e.g. by CloudFormation). It is possible that you could replace SimpleDB with some other data storage (and not rely on AWS), but that’s out of scope for now.

The result of running Priam would be that you have your Cassandra nodes in SimpleDB (you can browse it using this chrome extension as AWS doesn’t offer any UI) and, of course, backups will be automatically created in the backup S3 Bucket.

You can then restore a backup by logging to each node and executing:

curl http://localhost:8080/Priam/REST/v1/restore?daterange=201803180000,201803191200&region=eu-west-1&keyspaces=your_keyspace

You specify the time range for the restore. Still not ideal, as one would hope to have a one-click restore, but much better than rolling out your own backup & restore infrastructure.

One very important note here – vnodes are not supported. My original cluster had a default of 256 vnodes per machine and now it has just 1, because Priam doesn’t support anything other than 1. That’s a pity, since vnodes are the recommended way to setup Cassandra. Apparently Netflix don’t use those, however. There’s a work-in-progress branch for that that was abandoned 5 years ago. Fortunately, there’s a fresh pull request with Vnode support that can be used in conjunction with my pull request from this branch.

Priam replaces some Cassandra defaults with other values so you might want to compare your current setup and the newly generated cassandra.yaml. Overall it doesn’t feel super-production ready, but apparently it is, as Netflix is using it in production.

The post Setting Up Cassandra With Priam appeared first on Bozho's tech blog.

Using JWT For Sessions

Post Syndicated from Bozho original https://techblog.bozho.net/using-jwt-sessions/

The topic has been discussed many times, on hacker news, reddit, blogs. And the consensus is – DON’T USE JWT (for user sessions).

And I largely agree with the criticism of typical arguments for the JWT, the typical “but I can make it work…” explanations and the flaws of the JWT standard..

I won’t repeat everything here, so please go and read those articles. You can really shoot yourself in the foot with JWT, it’s complex to get to know it well and it has little benefits for most of the usecases. I guess for API calls it makes sense, especially if you reuse the same API in a single-page application and for your RESTful clients, but I’ll focus on the user session usecase.

Having all this criticism, I’ve gone against what the articles above recommend, and use JWT, navigating through their arguments and claiming I’m in a sweet spot. I can very well be wrong.

I store the user ID in a JWT token stored as a cookie. Not local storage, as that’s problematic. Not the whole state, as I don’t need that may lead to problems (pointed out in the linked articles). In fact, I don’t have any session state apart from the user data, which I think is a good practice.

What I want to avoid in my setup is sharing sessions across nodes. And this is a very compelling reason to not use the session mechanism of your web server/framework. No, you don’t need to have millions of users in order to need your application to run on more than one node. In fact, it should almost always run on (at least) two nodes, because nodes die and you don’t want downtime. Sticky sessions at the load balancer are a solution to that problem but you are just outsourcing the centralized session storage to the load balancer (and some load balancers might not support it). Shared session cache (e.g. memcached, elasticache, hazelcast) is also an option, and many web servers (at least in Java) support pluggable session replication mechanisms, but that introduces another component to the archtecture, another part of the stack to be supported and that can possibly break. It is not necessarily bad, but if there’s a simple way to avoid it, I’d go for it.

In order to avoid shared session storage, you need either the whole session state to be passed in the request/response cycle (as cookie, request parameter, header), or to receive a userId and load the user from the database or a cache. As we’ve learned, the former might be a bad choice. Despite that fact that frameworks like ASP.NET and JSF dump the whole state in the HTML of the page, it doesn’t intuitively sound good.

As for the latter – you may say “ok, if you are going to load the user from the database on every request this is going to be slow and if you use a cache, then why not use the cache for the sessions themselves?”. Well, the cache can be local. Remember we have just a few application nodes. Each node can have a local, in-memory cache for the currently active users. The fact that all nodes will have the same user loaded (after a few requests are routed to them by the load balancer in a round-robin fashion) is not important, as that cache is small. But you won’t have to take any care for replicating it across nodes, taking care of new nodes coming and going from the cluster, dealing with network issues between the nodes, etc. Each application node will be an island not caring about any other application node.

So here goes my first objection to the linked articles – just storing the user identifier in a JWT token is not pointless, as it saves you from session replication.

What about the criticism for the JWT standard and the security implications of its cryptography? Entirely correct, it’s easy to shoot yourself in the foot. That’s why I’m using JWT only with MAC, and only with a particular algorithm that I verify upon receiving the token, thus (allegedly) avoiding all the pitfalls. In all fairness, I’m willing to use the alternative proposed in one of the articles – PASETO – but it doesn’t have a Java library and it will take some time implementing one (might do in the future). To summarize – if there was another easy to use way for authenticated encryption of cookies, I’d use it.

So I’m basically using JWT in “PASETO-mode”, with only one operation and only one algorithm. And that should be fine as a general approach – the article doesn’t criticize the idea of having a user identifier in a token (and a stateless application node), it criticizes the complexity and vulnerabilities of the standard. This is sort of my second objection – “Don’t use JWT” is widely understood to mean “Don’t use tokens”, where that is not the case.

Have I introduced some vulnerability in my strive for architectural simplicity and lack of shared state? I hope not.

The post Using JWT For Sessions appeared first on Bozho's tech blog.

Adding Visible Electronic Signatures To PDFs

Post Syndicated from Bozho original https://techblog.bozho.net/adding-visible-electronic-signatures-pdf/

I’m aware this is going to be a very niche topic. Electronically signing PDFs is far from a mainstream usecase. However, I’ll write it for two reasons – first, I think it will be very useful for those few who actually need it, and second, I think it will become more and more common as the eIDAS regulation gain popularity – it basically says that electronic signatures are recognized everywhere in Europe (now, it’s not exactly true, because of some boring legal details, but anyway).

So, what is the usecase – first, you have to electronically sign the PDF with an a digital signature (the legal term is “electronic signature”, so I’ll use them interchangeably, although they don’t fully match – e.g. any electronic data applied to other data can be seen as an electronic signature, where a digital signature is the PKI-based signature).

Second, you may want to actually display the signature on the pages, rather than have the PDF reader recognize it and show it in some side-panel. Why is that? Because people are used to seeing signatures on pages and some may insist on having the signature visible (true story – I’ve got a comment that a detached signature “is not a REAL electronic signature, because it’s not visible on the page”).

Now, notice that I wrote “pages”, on “page”. Yes, an electronic document doesn’t have pages – it’s a stream of bytes. So having the signature just on the last page is okay. But, again, people are used to signing all pages, so they’d prefer the electronic signature to be visible on all pages.

And that makes the task tricky – PDF is good with having a digital signature box on the last page, but having multiple such boxes doesn’t work well. Therefore one has to add other types of annotations that look like a signature box and when clicked open the signature panel (just like an actual signature box).

I have to introduce here DSS – a wonderful set of components by the European Commission that can be used to sign and validate all sorts of electronic signatures. It’s open source, you can use at any way you like. Deploy the demo application, use only the libraries, whatever. It includes the signing functionality out of the box – just check the PAdESService or the PDFBoxSignatureService. It even includes the option to visualize the signature once (on a particular page).

However, it doesn’t have the option to show “stamps” (images) on multiple pages. Which is why I forked it and implemented the functionality. Most of my changes are in the PDFBoxSignatureService in the loadAndStampDocument(..) method. If you want to use that functionality you can just build a jar from my fork and use it (by passing the appropriate SignatureImageParameters to PAdESSErvice.sign(..) to define how the signature will look like).

Why is this needed in the first place? Because when a document is signed, you cannot modify it anymore, as you will change the hash. However, PDFs have incremental updates which allow appending to the document and thus having a newer version without modifying anything in the original version. That way the signature is still valid (the originally signed content is not modified), but new stuff is added. In our case, this new stuff is some “annotations”, which represent an image and a clickable area that opens the signature panel (in Adobe Reader at least). And while they are added before the signature box is added, if there are more than one signer, then the 2nd signer’s annotations are added after the first signature.

Sadly, PDFBox doesn’t support that out of the box. Well, it almost does – the piece of code below looks hacky, and it took a while to figure what exactly should be called and when, but it works with just a single reflection call:

    for (PDPage page : pdDocument.getPages()) {
        // reset existing annotations (needed in order to have the stamps added)
        page.setAnnotations(null);
    }
    // reset document outline (needed in order to have the stamps added)
    pdDocument.getDocumentCatalog().setDocumentOutline(null);
    List<PDAnnotation> annotations = addStamps(pdDocument, parameters);
			
    setDocumentId(parameters, pdDocument);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (COSWriter writer = new COSWriter(baos, new RandomAccessBuffer(pdfBytes))) {
        // force-add the annotations (wouldn't be saved in incremental updates otherwise)
        annotations.forEach(ann -> addObjectToWrite(writer, ann.getCOSObject()));
				
        // technically the same as saveIncremental but with more control
        writer.write(pdDocument);
    }
    pdDocument.close();
    pdDocument = PDDocument.load(baos.toByteArray());
    ...
}

private void addObjectToWrite(COSWriter writer, COSDictionary cosObject) {
    // the COSWriter does not expose the addObjectToWrite method, so we need reflection to add the annotations
    try {
        Method method = writer.getClass().getDeclaredMethod("addObjectToWrite", COSBase.class);
        method.setAccessible(true);
        method.invoke(writer, cosObject);
    } catch (Exception ex) {
        throw new RuntimeException(ex);
    }
}

What it does is – loads the original PDF, clears some internal catalogs, adds the annotations (images) to all pages, and then “force-add the annotations” because they “wouldn’t be saved in incremental updates otherwise”. I hope PDFBox make this a little more straightforward, but for the time being this works, and it doesn’t invalidate the existing signatures.

I hope that this posts introduces you to:

  • the existence of legally binding electronic signatures
  • the existence of the DSS utilities
  • the PAdES standard for PDF signing
  • how to place more than just one signature box in a PDF document

And I hope this article becomes more and more popular over time, as more and more businesses realize they could make use of electronic signatures.

The post Adding Visible Electronic Signatures To PDFs appeared first on Bozho's tech blog.

Integration With Zapier

Post Syndicated from Bozho original https://techblog.bozho.net/integration-with-zapier/

Integration is boring. And also inevitable. But I won’t be writing about enterprise integration patterns. Instead, I’ll explain how to create an app for integration with Zapier.

What is Zapier? It is a service that allows you tо connect two (or more) otherwise unconnected services via their APIs (or protocols). You can do stuff like “Create a Trello task from an Evernote note”, “publish new RSS items to Facebook”, “append new emails to a spreadsheet”, “post approaching calendar meeting to Slack”, “Save big email attachments to Dropbox”, “tweet all instagrams above a certain likes threshold”, and so on. In fact, it looks to cover mostly the same usecases as another famous service that I really like – IFTTT (if this then that), with my favourite use-case “Get a notification when the international space station passes over your house”. And all of those interactions can be configured via a UI.

Now that’s good for end users but what does it have to do with software development and integration? Zapier (unlike IFTTT, unfortunately), allows custom 3rd party services to be included. So if you have a service of your own, you can create an “app” and allow users to integrate your service with all the other 3rd party services. IFTTT offers a way to invoke web endpoints (including RESTful services), but it doesn’t allow setting headers, so that makes it quite limited for actual APIs.

In this post I’ll briefly explain how to write a custom Zapier app and then will discuss where services like Zapier stand from an architecture perspective.

The thing that I needed it for – to be able to integrate LogSentinel with any of the third parties available through Zapier, i.e. to store audit logs for events that happen in all those 3rd party systems. So how do I do that? There’s a tutorial that makes it look simple. And it is, with a few catches.

First, there are two tutorials – one in GitHub and one on Zapier’s website. And they differ slightly, which becomes tricky in some cases.

I initially followed the GitHub tutorial and had my build fail. It claimed the zapier platform dependency is missing. After I compared it with the example apps, I found out there’s a caret in front of the zapier platform dependency. Removing it just yielded another error – that my node version should be exactly 6.10.2. Why?

The Zapier CLI requires you have exactly version 6.10.2 installed. You’ll see errors and will be unable to proceed otherwise.

It appears that they are using AWS Lambda which is stuck on Node 6.10.2 (actually – it’s 6.10.3 when you check). The current major release is 8, so minus points for choosing … javascript for a command-line tool and for building sandboxed apps. Maybe other decisions had their downsides as well, I won’t be speculating. Maybe it’s just my dislike for dynamic languages.

So, after you make sure you have the correct old version on node, you call zapier init and make sure there are no carets, npm install and then zapier test. So far so good, you have a dummy app. Now how do you make a RESTful call to your service?

Zapier splits the programmable entities in two – “triggers” and “creates”. A trigger is the event that triggers the whole app, an a “create” is what happens as a result. In my case, my app doesn’t publish any triggers, it only accepts input, so I won’t be mentioning triggers (though they seem easy). You configure all of the elements in index.js (e.g. this one):

const log = require('./creates/log');
....
creates: {
    [log.key]: log,
}

The log.js file itself is the interesting bit – there you specify all the parameters that should be passed to your API call, as well as making the API call itself:

const log = (z, bundle) => {
  const responsePromise = z.request({
    method: 'POST',
    url: `https://api.logsentinel.com/api/log/${bundle.inputData.actorId}/${bundle.inputData.action}`,
    body: bundle.inputData.details,
	headers: {
		'Accept': 'application/json'
	}
  });
  return responsePromise
    .then(response => JSON.parse(response.content));
};

module.exports = {
  key: 'log-entry',
  noun: 'Log entry',

  display: {
    label: 'Log',
    description: 'Log an audit trail entry'
  },

  operation: {
    inputFields: [
      {key: 'actorId', label:'ActorID', required: true},
      {key: 'action', label:'Action', required: true},
      {key: 'details', label:'Details', required: false}
    ],
    perform: log
  }
};

You can pass the input parameters to your API call, and it’s as simple as that. The user can then specify which parameters from the source (“trigger”) should be mapped to each of your parameters. In an example zap, I used an email trigger and passed the sender as actorId, the sibject as “action” and the body of the email as details.

There’s one more thing – authentication. Authentication can be done in many ways. Some services offer OAuth, others – HTTP Basic or other custom forms of authentication. There is a section in the documentation about all the options. In my case it was (almost) an HTTP Basic auth. My initial thought was to just supply the credentials as parameters (which you just hardcode rather than map to trigger parameters). That may work, but it’s not the canonical way. You should configure “authentication”, as it triggers a friendly UI for the user.

You include authentication.js (which has the fields your authentication requires) and then pre-process requests by adding a header (in index.js):

const authentication = require('./authentication');

const includeAuthHeaders = (request, z, bundle) => {
  if (bundle.authData.organizationId) {
	request.headers = request.headers || {};
	request.headers['Application-Id'] = bundle.authData.applicationId
	const basicHash = Buffer(`${bundle.authData.organizationId}:${bundle.authData.apiSecret}`).toString('base64');
	request.headers['Authorization'] = `Basic ${basicHash}`;
  }
  return request;
};

const App = {
  // This is just shorthand to reference the installed dependencies you have. Zapier will
  // need to know these before we can upload
  version: require('./package.json').version,
  platformVersion: require('zapier-platform-core').version,
  authentication: authentication,
  
  // beforeRequest & afterResponse are optional hooks into the provided HTTP client
  beforeRequest: [
	includeAuthHeaders
  ]
...
}

And then you zapier push your app and you can test it. It doesn’t automatically go live, as you have to invite people to try it and use it first, but in many cases that’s sufficient (i.e. using Zapier when doing integration with a particular client)

Can Zapier can be used for any integration problem? Unlikely – it’s pretty limited and simple, but that’s also a strength. You can, in half a day, make your service integrate with thousands of others for the most typical use-cases. And not that although it’s meant for integrating public services rather than for enterprise integration (where you make multiple internal systems talk to each other), as an increasing number of systems rely on 3rd party services, it could find home in an enterprise system, replacing some functions of an ESB.

Effectively, such services (Zapier, IFTTT) are “Simple ESB-as-a-service”. You go to a UI, fill a bunch of fields, and you get systems talking to each other without touching the systems themselves. I’m not a big fan of ESBs, mostly because they become harder to support with time. But minimalist, external ones might be applicable in certain situations. And while such services are primarily aimed at end users, they could be a useful bit in an enterprise architecture that relies on 3rd party services.

Whether it could process the required load, whether an organization is willing to let its data flow through a 3rd party provider (which may store the intermediate parameters), is a question that should be answered in a case by cases basis. I wouldn’t recommend it as a general solution, but it’s certainly an option to consider.

The post Integration With Zapier appeared first on Bozho's tech blog.

GDPR for Developers [presentation]

Post Syndicated from Bozho original https://techblog.bozho.net/gdpr-developers-presentation/

On a recent meetup in Amsterdam I talked about GDPR from a technical point of view, effectively turning my “GDPR – a practical guide for developers” article into a talk.

You can see the slides here:

If you’re interested, you can also join a webinar on the same topic, organized by AxonIQ, where I will join Frans Vanbuul. You can find more information about the webinar here.

The interesting thing that I can share after the meetup and after meeting with potential clients is that everyone (maybe unsurprisingly) has a very specific question that doesn’t get an immediate answer even after you follow the general guidelines. That is maybe a problem on the Regulation’s side, as it has not brought sufficient clarity to businesses.

As I said during the presentation – in technology we’re used with binary questions. In law and legal compliance an answer is somewhere on a scale between 1 and 10. “Do I have to encrypt my data at rest”? Well, I guess yes, but in terms of compliance I’d say “6 out of 10”, as it is not strict, depends on the multiple people’s interpretation of the sensitivity of the data and on other factors like access control.

So the communication between legal and technical people is key to understand what exactly implementation changes are needed.

The post GDPR for Developers [presentation] appeared first on Bozho's tech blog.