Abstract: Risk-based Authentication (RBA) is an adaptive security measure to strengthen password-based authentication. RBA monitors additional features during login, and when observed feature values differ significantly from previously seen ones, users have to provide additional authentication factors such as a verification code. RBA has the potential to offer more usable authentication, but the usability and the security perceptions of RBA are not studied well.
We present the results of a between-group lab study (n=65) to evaluate usability and security perceptions of two RBA variants, one 2FA variant, and password-only authentication. Our study shows with significant results that RBA is considered to be more usable than the studied 2FA variants, while it is perceived as more secure than password-only authentication in general and comparably se-cure to 2FA in a variety of application types. We also observed RBA usability problems and provide recommendations for mitigation.Our contribution provides a first deeper understanding of the users’perception of RBA and helps to improve RBA implementations for a broader user acceptance.
SIM hijacking — or SIM swapping — is an attack where a fraudster contacts your cell phone provider and convincesthem to switch your account to a phone that they control. Since your smartphone often serves as a security measure or backup verification system, this allows the fraudster to take over other accounts of yours. Sometimes this involves peopleinside the phone companies.
Phone companies have added security measures since this attack became popular and public, but a new study (news article) shows that the measures aren’t helping:
We examined the authentication procedures used by five pre-paid wireless carriers when a customer attempted to change their SIM card. These procedures are an important line of defense against attackers who seek to hijack victims’ phone numbers by posing as the victim and calling the carrier to request that service be transferred to a SIM card the attacker possesses. We found that all five carriers used insecure authentication challenges that could be easily subverted by attackers.We also found that attackers generally only needed to target the most vulnerable authentication challenges, because the rest could be bypassed.
It’s a classic security vs. usability trade-off. The phone companies want to provide easy customer service for their legitimate customers, and that system is what’s being exploited by the SIM hijackers. Companies could make the fraud harder, but it would necessarily also make it harder for legitimate customers to modify their accounts.
Apple is rolling out an iOS security usability feature called Security code AutoFill. The basic idea is that the OS scans incoming SMS messages for security codes and suggests them in AutoFill, so that people can use them without having to memorize or type them.
Sounds like a really good idea, but Andreas Gutmann points out an application where this could become a vulnerability: when authenticating transactions:
Transaction authentication, as opposed to user authentication, is used to attest the correctness of the intention of an action rather than just the identity of a user. It is most widely known from online banking, where it is an essential tool to defend against sophisticated attacks. For example, an adversary can try to trick a victim into transferring money to a different account than the one intended. To achieve this the adversary might use social engineering techniques such as phishing and vishing and/or tools such as Man-in-the-Browser malware.
Transaction authentication is used to defend against these adversaries. Different methods exist but in the one of relevance here — which is among the most common methods currently used — the bank will summarise the salient information of any transaction request, augment this summary with a TAN tailored to that information, and send this data to the registered phone number via SMS. The user, or bank customer in this case, should verify the summary and, if this summary matches with his or her intentions, copy the TAN from the SMS message into the webpage.
This new iOS feature creates problems for the use of SMS in transaction authentication. Applied to 2FA, the user would no longer need to open and read the SMS from which the code has already been conveniently extracted and presented. Unless this feature can reliably distinguish between OTPs in 2FA and TANs in transaction authentication, we can expect that users will also have their TANs extracted and presented without context of the salient information, e.g. amount and destination of the transaction. Yet, precisely the verification of this salient information is essential for security. Examples of where this scenario could apply include a Man-in-the-Middle attack on the user accessing online banking from their mobile browser, or where a malicious website or app on the user’s phone accesses the bank’s legitimate online banking service.
This is an interesting interaction between two security systems. Security code AutoFill eliminates the need for the user to view the SMS or memorize the one-time code. Transaction authentication assumes the user read and approved the additional information in the SMS message before using the one-time code.
User authentication is the functionality that every web application shared. We should have perfected that a long time ago, having implemented it so many times. And yet there are so many mistakes made all the time.
Part of the reason for that is that the list of things that can go wrong is long. You can store passwords incorrectly, you can have a vulnerably password reset functionality, you can expose your session to a CSRF attack, your session can be hijacked, etc. So I’ll try to compile a list of best practices regarding user authentication. OWASP top 10 is always something you should read, every year. But that might not be enough.
So, let’s start. I’ll try to be concise, but I’ll include as much of the related pitfalls as I can cover – e.g. what could go wrong with the user session after they login:
Store passwords with bcrypt/scrypt/PBKDF2. No MD5 or SHA, as they are not good for password storing. Long salt (per user) is mandatory (the aforementioned algorithms have it built in). If you don’t and someone gets hold of your database, they’ll be able to extract the passwords of all your users. And then try these passwords on other websites.
Use HTTPS. Period. (Otherwise user credentials can leak through unprotected networks). Force HTTPS if user opens a plain-text version.
Mark cookies as secure. Makes cookie theft harder.
Use CSRF protection (e.g. CSRF one-time tokens that are verified with each request). Frameworks have such functionality built-in.
Logout – let your users logout by deleting all cookies and invalidating the session. This makes usage of shared computers safer (yes, users should ideally use private browsing sessions, but not all of them are that savvy)
Session expiry – don’t have forever-lasting sessions. If the user closes your website, their session should expire after a while. “A while” may still be a big number depending on the service provided. For ajax-heavy website you can have regular ajax-polling that keeps the session alive while the page stays open.
Remember me – implementing “remember me” (on this machine) functionality is actually hard due to the risks of a stolen persistent cookie. Spring-security uses this approach, which I think should be followed if you wish to implement more persistent logins.
Forgotten password flow – the forgotten password flow should rely on sending a one-time (or expiring) link to the user and asking for a new password when it’s opened. 0Auth explain it in this post and Postmark gives some best pracitces. How the link is formed is a separate discussion and there are several approaches. Store a password-reset token in the user profile table and then send it as parameter in the link. Or do not store anything in the database, but send a few params: userId:expiresTimestamp:hmac(userId+expiresTimestamp). That way you have expiring links (rather than one-time links). The HMAC relies on a secret key, so the links can’t be spoofed. It seems there’s no consensus, as the OWASP guide has a bit different approach
One-time login links – this is an option used by Slack, which sends one-time login links instead of asking users for passwords. It relies on the fact that your email is well guarded and you have access to it all the time. If your service is not accessed to often, you can have that approach instead of (rather than in addition to) passwords.
Limit login attempts – brute-force through a web UI should not be possible; therefore you should block login attempts if they become too many. One approach is to just block them based on IP. The other one is to block them based on account attempted. (Spring example here). Which one is better – I don’t know. Both can actually be combined. Instead of fully blocking the attempts, you may add a captcha after, say, the 5th attempt. But don’t add the captcha for the first attempt – it is bad user experience.
Don’t leak information through error messages – you shouldn’t allow attackers to figure out if an email is registered or not. If an email is not found, upon login report just “Incorrect credentials”. On passwords reset, it may be something like “If your email is registered, you should have received a password reset email”. This is often at odds with usability – people don’t often remember the email they used to register, and the ability to check a number of them before getting in might be important. So this rule is not absolute, though it’s desirable, especially for more critical systems.
Consider using a 3rd party authentication – OpenID Connect, OAuth by Google/Facebook/Twitter (but be careful with OAuth flaws as well). There’s an associated risk with relying on a 3rd party identity provider, and you still have to manage cookies, logout, etc., but some of the authentication aspects are simplified.
For high-risk or sensitive applications use 2-factor authentication. There’s a caveat with Google Authenticator though – if you lose your phone, you lose your accounts (unless there’s a manual process to restore it). That’s why Authy seems like a good solution for storing 2FA keys.
I’m sure I’m missing something. And you see it’s complicated. Sadly we’re still at the point where the most common functionality – authenticating users – is so tricky and cumbersome, that you almost always get at least some of it wrong.
GDPR is the new data protection regulation, as you probably already know. I’ve given a detailed practical advice for what it means for developers (and product owners). However, there’s one thing missing there – cookies. The elephant in the room.
Previously I’ve stated that cookies are subject to another piece of legislation – the ePrivacy directive, which is getting updated and its new version will be in force a few years from now. And while that’s technically correct, cookies seem to be affected by GDPR as well. In a way I’ve underestimated that effect.
When you do a Google search on “GDPR cookies”, you’ll pretty quickly realize that a) there’s not too much information and b) there’s not much technical understanding of the issue.
What appears to be the consensus is that GDPR does change the way cookies are handled. More specifically – tracking cookies. Here’s recital 30:
(30) Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.
How tracking cookies work – a 3rd party (usually an ad network) gives you a code snippet that you place on your website, for example to display ads. That code snippet, however, calls “home” (makes a request to the 3rd party domain). If the 3rd party has previously been used on your computer, it has created a cookie. In the example of Facebook, they have the cookie with your Facebook identifier because you’ve logged in to Facebook. So this cookie (with your identifier) is sent with the request. The request also contains all the details from the page. In effect, you are uniquely identified by an identifier (in the case of Facebook and Google – fully identified, rather than some random anonymous identifier as with other ad networks).
Your behaviour on the website is personal data. It gets associated with your identifier, which in turn is associated with your profile. And all of that is personal data. Who is responsible for collecting the website behaviour data, i.e. who is the “controller”? Is it Facebook (or any other 3rd party) that technically does the collection? No, it’s the website owner, as the behaviour data is obtained on their website, and they have put the tracking piece of code there. So they bear responsibility.
For the data collected by tracking cookies you have two options – “consent” and “legitimate interest”. Legitimate interest will be hard to prove – it is not something that a user reasonably expects, it is not necessary for you to provide the service. If your lawyers can get that option to fly, good for them, but I’m not convinced regulators will be happy with that.
The other option is “consent”. You have to ask your users explicitly – that means “with a checkbox” – to let you use tracking cookies. That has two serious implications – from technical and usability point of view.
The usability aspect is the bigger issue – while you could neatly tuck a cookie warning at the bottom, you’d now have to have a serious, “stop the world” popup that asks for consent if you want anyone to click it. You can, of course, just add a checkbox to the existing cookie warning, but don’t expect anyone to click it.
These aspects pose a significant questions: is it worth it to have tracking cookies? Is developing new functionality worth it, is interrupting the user worth it, and is implementing new functionality just so that users never clicks a hidden checkbox worth it? Especially given that Firefox now blocks all tracking cookies and possibly other browsers will follow?
That by itself is an interesting topic – Firefox has basically implemented the most strict form of requirements of the upcoming ePrivacy directive update (that would turn it into an ePrivacy regulation). Other browsers will have to follow, even though Google may not be happy to block their own tracking cookies. I hope other browsers follow Firefox in tracking protection and the issue will be gone automatically.
To me it seems that it will be increasingly not worthy to have tracking cookies on your website. They add regulatory obligations for you and give you very little benefit (yes, you could track engagement from ads, but you can do that in other ways, arguably by less additional code than supporting the cookie consents). And yes, the cookie consent will be “outsourced” to browsers after the ePrivacy regulation is passed, but we can’t be sure at the moment whether there won’t be technical whack-a-mole between browsers and advertisers and whether you wouldn’t still need additional effort to have dynamic consent for tracking cookies. (For example there are reported issues that Firefox used to make Facebook login fail if tracking protection is enabled. Which could be a simple bug, or could become a strategy by big vendors in the future to force browsers into a less strict tracking protection).
Okay, we’ve decided it’s not worth it managing tracking cookies. But do you have a choice as a website owner? Can you stop your ad network from using them? (Remember – you are liable if users’ data is collected by visiting your website). And currently the answer is no – you can’t disable that. You can’t have “just the ads”. This is part of the “deal” – you get money for the ads you place, but you participate in a big “surveillance” network. Users have a way to opt out (e.g. Google AdWords gives them that option). You, as a website owner, don’t.
And sometimes you don’t want to serve ads, just track user behaviour and measure conversion. But even if you ask for consent for that and conditionally insert the plugin/snippet, do you actually know what data it sends? And what it’s used for? Because you have to know in order to inform your users. “Do you agree to use tracking cookies that Facebook has inserted in order to collect data about your behaviour on our website” doesn’t sound compelling.
So, what to do? The easiest thing is just not to use any 3rd party ad-related plugins. But that’s obviously not an option, as ad revenue is important, especially in the publishing industry. I don’t have a good answer, apart from “Regulators should pressure ad networks to provide opt-outs and clearly document their data usage”. They have to do that under GDPR, and while website owners are responsible for their users’ data, the ad networks that are in the role of processors in this case (as you delegate the data collection for your visitors to them) also have obligation to assist you in fulfilling your obligations. So ask Facebook – what should I do with your tracking cookies? And when the regulator comes after a privacy-aware customer files a complaint, you could prove that you’ve tried.
The ethical debate whether it’s wrong to collect data about peoples’ behaviour without their informed consent is an easy one. And that’s why I don’t put blame on the regulators – they are putting the ethical consensus in law. It gets more complicated if not allowing tracking means some internet services are no longer profitable and therefore can’t exist. Can we have the cake and eat it too?
Over on the Red Hat Developer Program blog, David Malcolm describes a number of usability improvements that he has made for the upcoming GCC 8 release. Malcolm has made a number of the C/C++ compiler error messages much more helpful, including adding hints for integrated development environments (IDEs) and other tools to suggest fixes for syntax and other kinds of errors. “[…] the code is fine, but, as is common with fragments of code seen on random websites, it’s missing #include directives. If you simply copy this into a new file and try to compile it as-is, it fails.
This can be frustrating when copying and pasting examples – off the top of your head, which header files are needed by the above? – so for gcc 8 I’ve added hints telling you which header files are missing (for the most common cases).” He has various examples showing what the new error messages and hints look like in the blog post.
Whether you want to review a Security, Compliance, & Identity track session you attended at AWS re:Invent 2017, or you want to experience a session for the first time, videos and slide decks from the Security, Compliance, & Identity track are now available.
SID201: IAM for Enterprises: How Vanguard Strikes the Balance Between Agility, Governance, and Security
Today, the following Security, Compliance, & Identity sessions, workshops, and chalk talks will be presented at AWS re:Invent 2017 in Las Vegas. All sessions are in the MGM Grand and all times are local. See the re:Invent Session Catalog for complete information about every session. You can also download the AWS re:Invent 2017 Mobile App for the latest updates and information.
If you are not attending re:Invent 2017, keep in mind that all videos of and slide decks from these sessions will be made available next week. We will publish a post on the Security Blog next week that links to all available videos and slide decks.
You’ve probably heard about GDPR. The new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it’s most likely that there’s already a process for gettign your systems in compliance with the regulation.
The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe, but are having European customers. So that’s most companies. I will not go into yet another “12 facts about GDPR” or “7 myths about GDPR” posts/whitepapers, as they are often aimed at managers or legal people. Instead, I’ll focus on what GDPR means for developers.
Why am I qualified to do that? A few reasons – I was advisor to the deputy prime minister of a EU country, and because of that I’ve been both exposed and myself wrote some legislation. I’m familiar with the “legalese” and how the regulatory framework operates in general. I’m also a privacy advocate and I’ve been writing about GDPR-related stuff in the past, i.e. “before it was cool” (protecting sensitive data, the right to be forgotten). And finally, I’m currently working on a project that (among other things) aims to help with covering some GDPR aspects.
I’ll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it’s not unlikely that a less informed manager storms in in late spring, realizing GDPR is going to be in force tomorrow, asking “what should we do to get our system/website compliant”.
The rights of the user/client (referred to as “data subject” in the regulation) that I think are relevant for developers are: the right to erasure (the right to be forgotten/deleted from the system), right to restriction of processing (you still keep the data, but mark it as “restricted” and don’t touch it without further consent by the user), the right to data portability (the ability to export one’s data), the right to rectification (the ability to get personal data fixed), the right to be informed (getting human-readable information, rather than long terms and conditions), the right of access (the user should be able to see all the data you have about them), the right to data portability (the user should be able to get a machine-readable dump of their data).
Additionally, the relevant basic principles are: data minimization (one should not collect more data than necessary), integrity and confidentiality (all security measures to protect data that you can think of + measures to guarantee that the data has not been inappropriately modified).
Even further, the regulation requires certain processes to be in place within an organization (of more than 250 employees or if a significant amount of data is processed), and those include keeping a record of all types of processing activities carried out, including transfers to processors (3rd parties), which includes cloud service providers. None of the other requirements of the regulation have an exception depending on the organization size, so “I’m small, GDPR does not concern me” is a myth.
It is important to know what “personal data” is. Basically, it’s every piece of data that can be used to uniquely identify a person or data that is about an already identified person. It’s data that the user has explicitly provided, but also data that you have collected about them from either 3rd parties or based on their activities on the site (what they’ve been looking at, what they’ve purchased, etc.)
Having said that, I’ll list a number of features that will have to be implemented and some hints on how to do that, followed by some do’s and don’t’s.
“Forget me” – you should have a method that takes a userId and deletes all personal data about that user (in case they have been collected on the basis of consent, and not due to contract enforcement or legal obligation). It is actually useful for integration tests to have that feature (to cleanup after the test), but it may be hard to implement depending on the data model. In a regular data model, deleting a record may be easy, but some foreign keys may be violated. That means you have two options – either make sure you allow nullable foreign keys (for example an order usually has a reference to the user that made it, but when the user requests his data be deleted, you can set the userId to null), or make sure you delete all related data (e.g. via cascades). This may not be desirable, e.g. if the order is used to track available quantities or for accounting purposes. It’s a bit trickier for event-sourcing data models, or in extreme cases, ones that include some sort of blcokchain/hash chain/tamper-evident data structure. With event sourcing you should be able to remove a past event and re-generate intermediate snapshots. For blockchain-like structures – be careful what you put in there and avoid putting personal data of users. There is an option to use a chameleon hash function, but that’s suboptimal. Overall, you must constantly think of how you can delete the personal data. And “our data model doesn’t allow it” isn’t an excuse.
Notify 3rd parties for erasure – deleting things from your system may be one thing, but you are also obligated to inform all third parties that you have pushed that data to. So if you have sent personal data to, say, Salesforce, Hubspot, twitter, or any cloud service provider, you should call an API of theirs that allows for the deletion of personal data. If you are such a provider, obviously, your “forget me” endpoint should be exposed. Calling the 3rd party APIs to remove data is not the full story, though. You also have to make sure the information does not appear in search results. Now, that’s tricky, as Google doesn’t have an API for removal, only a manual process. Fortunately, it’s only about public profile pages that are crawlable by Google (and other search engines, okay…), but you still have to take measures. Ideally, you should make the personal data page return a 404 HTTP status, so that it can be removed.
Restrict processing – in your admin panel where there’s a list of users, there should be a button “restrict processing”. The user settings page should also have that button. When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the backoffice staff, or publicly. You can implement that with a simple “restricted” flag in the users table and a few if-clasues here and there.
Export data – there should be another button – “export data”. When clicked, the user should receive all the data that you hold about them. What exactly is that data – depends on the particular usecase. Usually it’s at least the data that you delete with the “forget me” functionality, but may include additional data (e.g. the orders the user has made may not be delete, but should be included in the dump). The structure of the dump is not strictly defined, but my recommendation would be to reuse schema.org definitions as much as possible, for either JSON or XML. If the data is simple enough, a CSV/XLS export would also be fine. Sometimes data export can take a long time, so the button can trigger a background process, which would then notify the user via email when his data is ready (twitter, for example, does that already – you can request all your tweets and you get them after a while).
Allow users to edit their profile – this seems an obvious rule, but it isn’t always followed. Users must be able to fix all data about them, including data that you have collected from other sources (e.g. using a “login with facebook” you may have fetched their name and address). Rule of thumb – all the fields in your “users” table should be editable via the UI. Technically, rectification can be done via a manual support process, but that’s normally more expensive for a business than just having the form to do it. There is one other scenario, however, when you’ve obtained the data from other sources (i.e. the user hasn’t provided their details to you directly). In that case there should still be a page where they can identify somehow (via email and/or sms confirmation) and get access to the data about them.
Consent checkboxes – this is in my opinion the biggest change that the regulation brings. “I accept the terms and conditions” would no longer be sufficient to claim that the user has given their consent for processing their data. So, for each particular processing activity there should be a separate checkbox on the registration (or user profile) screen. You should keep these consent checkboxes in separate columns in the database, and let the users withdraw their consent (by unchecking these checkboxes from their profile page – see the previous point). Ideally, these checkboxes should come directly from the register of processing activities (if you keep one). Note that the checkboxes should not be preselected, as this does not count as “consent”.
Re-request consent – if the consent users have given was not clear (e.g. if they simply agreed to terms & conditions), you’d have to re-obtain that consent. So prepare a functionality for mass-emailing your users to ask them to go to their profile page and check all the checkboxes for the personal data processing activities that you have.
“See all my data” – this is very similar to the “Export” button, except data should be displayed in the regular UI of the application rather than an XML/JSON format. For example, Google Maps shows you your location history – all the places that you’ve been to. It is a good implementation of the right to access. (Though Google is very far from perfect when privacy is concerned)
Age checks – you should ask for the user’s age, and if the user is a child (below 16), you should ask for parent permission. There’s no clear way how to do that, but my suggestion is to introduce a flow, where the child should specify the email of a parent, who can then confirm. Obviosuly, children will just cheat with their birthdate, or provide a fake parent email, but you will most likely have done your job according to the regulation (this is one of the “wishful thinking” aspects of the regulation).
Encrypt the data in transit. That means that communication between your application layer and your database (or your message queue, or whatever component you have) should be over TLS. The certificates could be self-signed (and possibly pinned), or you could have an internal CA. Different databases have different configurations, just google “X encrypted connections. Some databases need gossiping among the nodes – that should also be configured to use encryption
Encrypt the data at rest – this again depends on the database (some offer table-level encryption), but can also be done on machine-level. E.g. using LUKS. The private key can be stored in your infrastructure, or in some cloud service like AWS KMS.
Encrypt your backups – kind of obvious
Implement pseudonymisation – the most obvious use-case is when you want to use production data for the test/staging servers. You should change the personal data to some “pseudonym”, so that the people cannot be identified. When you push data for machine learning purposes (to third parties or not), you can also do that. Technically, that could mean that your User object can have a “pseudonymize” method which applies hash+salt/bcrypt/PBKDF2 for some of the data that can be used to identify a person
Protect data integrity – this is a very broad thing, and could simply mean “have authentication mechanisms for modifying data”. But you can do something more, even as simple as a checksum, or a more complicated solution (like the one I’m working on). It depends on the stakes, on the way data is accessed, on the particular system, etc. The checksum can be in the form of a hash of all the data in a given database record, which should be updated each time the record is updated through the application. It isn’t a strong guarantee, but it is at least something.
Have your GDPR register of processing activities in something other than Excel – Article 30 says that you should keep a record of all the types of activities that you use personal data for. That sounds like bureaucracy, but it may be useful – you will be able to link certain aspects of your application with that register (e.g. the consent checkboxes, or your audit trail records). It wouldn’t take much time to implement a simple register, but the business requirements for that should come from whoever is responsible for the GDPR compliance. But you can advise them that having it in Excel won’t make it easy for you as a developer (imagine having to fetch the excel file internally, so that you can parse it and implement a feature). Such a register could be a microservice/small application deployed separately in your infrastructure.
Log access to personal data – every read operation on a personal data record should be logged, so that you know who accessed what and for what purpose
Register all API consumers – you shouldn’t allow anonymous API access to personal data. I’d say you should request the organization name and contact person for each API user upon registration, and add those to the data processing register. Note: some have treated article 30 as a requirement to keep an audit log. I don’t think it is saying that – instead it requires 250+ companies to keep a register of the types of processing activities (i.e. what you use the data for). There are other articles in the regulation that imply that keeping an audit log is a best practice (for protecting the integrity of the data as well as to make sure it hasn’t been processed without a valid reason)
Finally, some “don’t’s”.
Don’t use data for purposes that the user hasn’t agreed with – that’s supposed to be the spirit of the regulation. If you want to expose a new API to a new type of clients, or you want to use the data for some machine learning, or you decide to add ads to your site based on users’ behaviour, or sell your database to a 3rd party – think twice. I would imagine your register of processing activities could have a button to send notification emails to users to ask them for permission when a new processing activity is added (or if you use a 3rd party register, it should probably give you an API). So upon adding a new processing activity (and adding that to your register), mass email all users from whom you’d like consent.
Don’t log personal data – getting rid of the personal data from log files (especially if they are shipped to a 3rd party service) can be tedious or even impossible. So log just identifiers if needed. And make sure old logs files are cleaned up, just in case
Don’t put fields on the registration/profile form that you don’t need – it’s always tempting to just throw as many fields as the usability person/designer agrees on, but unless you absolutely need the data for delivering your service, you shouldn’t collect it. Names you should probably always collect, but unless you are delivering something, a home address or phone is unnecessary.
Don’t assume 3rd parties are compliant – you are responsible if there’s a data breach in one of the 3rd parties (e.g. “processors”) to which you send personal data. So before you send data via an API to another service, make sure they have at least a basic level of data protection. If they don’t, raise a flag with management.
Don’t assume having ISO XXX makes you compliant – information security standards and even personal data standards are a good start and they will probably 70% of what the regulation requires, but they are not sufficient – most of the things listed above are not covered in any of those standards
Overall, the purpose of the regulation is to make you take conscious decisions when processing personal data. It imposes best practices in a legal way. If you follow the above advice and design your data model, storage, data flow , API calls with data protection in mind, then you shouldn’t worry about the huge fines that the regulation prescribes – they are for extreme cases, like Equifax for example. Regulators (data protection authorities) will most likely have some checklists into which you’d have to somehow fit, but if you follow best practices, that shouldn’t be an issue.
I think all of the above features can be implemented in a few weeks by a small team. Be suspicious when a big vendor offers you a generic plug-and-play “GDPR compliance” solution. GDPR is not just about the technical aspects listed above – it does have organizational/process implications. But also be suspicious if a consultant claims GDPR is complicated. It’s not – it relies on a few basic principles that are in fact best practices anyway. Just don’t ignore them.
Many of us are visiting parents/relatives this Thanksgiving/Christmas, and will have an opportunity to help our them with cybersecurity issues. I thought I’d write up a quick guide of the most important things.
1. Stop them from reusing passwords
By far the biggest threat to average people is that they re-use the same password across many websites, so that when one website gets hacked, all their accounts get hacked.
To demonstrate the problem, go to haveibeenpwned.com and enter the email address of your relatives. This will show them a number of sites where their password has already been stolen, like LinkedIn, Adobe, etc. That should convince them of the severity of the problem.
They don’t need a separate password for every site. You don’t care about the majority of website whether you get hacked. Use a common password for all the meaningless sites. You only need unique passwords for important accounts, like email, Facebook, and Twitter.
Write down passwords and store them in a safe place. Sure, it’s a common joke that people in offices write passwords on Post-It notes stuck on their monitors or under their keyboards. This is a common security mistake, but that’s only because the office environment is widely accessible. Your home isn’t, and there’s plenty of places to store written passwords securely, such as in a home safe. Even if it’s just a desk drawer, such passwords are safe from hackers, because they aren’t on a computer.
Write them down, with pen and paper. Don’t put them in a MyPasswords.doc, because when a hacker breaks in, they’ll easily find that document and easily hack your accounts.
You might help them out with getting a password manager, or two-factor authentication (2FA). Good 2FA like YubiKey will stop a lot of phishing threats. But this is difficult technology to learn, and of course, you’ll be on the hook for support issues, such as when they lose the device. Thus, while 2FA is best, I’m only recommending pen-and-paper to store passwords. (AccessNow has a guide, though I think YubiKey/U2F keys for Facebook and GMail are the best).
2. Lock their phone (passcode, fingerprint, faceprint)
You’ll lose your phone at some point. It has the keys all all your accounts, like email and so on. With your email, phones thieves can then reset passwords on all your other accounts. Thus, it’s incredibly important to lock the phone.
Apple has made this especially easy with fingerprints (and now faceprints), so there’s little excuse not to lock the phone.
Note that Apple iPhones are the most secure. I give my mother my old iPhones so that they will have something secure.
My mom demonstrates a problem you’ll have with the older generation: she doesn’t reliably have her phone with her, and charged. She’s the opposite of my dad who religiously slaved to his phone. Even a small change to make her lock her phone means it’ll be even more likely she won’t have it with her when you need to call her.
3. WiFi (WPA)
Make sure their home WiFi is WPA encrypted. It probably already is, but it’s worthwhile checking.
The password should be written down on the same piece of paper as all the other passwords. This is importance. My parents just moved, Comcast installed a WiFi access point for them, and they promptly lost the piece of paper. When I wanted to debug some thing on their network today, they didn’t know the password, and couldn’t find the paper. Get that password written down in a place it won’t get lost!
Discourage them from extra security features like “SSID hiding” and/or “MAC address filtering”. They provide no security benefit, and actually make security worse. It means a phone has to advertise the SSID when away from home, and it makes MAC address randomization harder, both of which allows your privacy to be tracked.
If they have a really old home router, you should probably replace it, or at least update the firmware. A lot of old routers have hacks that allow hackers (like me masscaning the Internet) to easily break in.
4. Ad blockers or Brave
Most of the online tricks that will confuse your older parents will come via advertising, such as popups claiming “You are infected with a virus, click here to clean it”. Installing an ad blocker in the browser, such as uBlock Origin, stops most all this nonsense.
For example, here’s a screenshot of going to the “Speedtest” website to test the speed of my connection (I took this on the plane on the way home for Thanksgiving). Ignore the error (plane’s firewall Speedtest) — but instead look at the advertising banner across the top of the page insisting you need to download a browser extension. This is tricking you into installing malware — the ad appears as if it’s a message from Speedtest, it’s not. Speedtest is just selling advertising and has no clue what the banner says. This sort of thing needs to be blocked — it fools even the technologically competent.
uBlock Origin for Chrome is the one I use. Another option is to replace their browser with Brave, a browser that blocks ads, but at the same time, allows micropayments to support websites you want to support. I use Brave on my iPhone.
A side benefit of ad blockers or Brave is that web surfing becomes much faster, since you aren’t downloading all this advertising. The smallest NYtimes story is 15 megabytes in size due to all the advertisements, for example.
5. Cloud Backups
Do backups, in the cloud. It’s a good idea in general, especially with the threat of ransomware these days.
In particular, consider your photos. Over time, they will be lost, because people make no effort to keep track of them. All hard drives will eventually crash, deleting your photos. Sure, a few key ones are backed up on Facebook for life, but the rest aren’t.
There are so many excellent online backup services out there, like DropBox and Backblaze. Or, you can use the iCloud feature that Apple provides. My favorite is Microsoft’s: I already pay $99 a year for Office 365 subscription, and it comes with 1-terabyte of online storage.
6. Separate email accounts
You should have three email accounts: work, personal, and financial.
First, you really need to separate your work account from personal. The IT department is already getting misdirected emails with your spouse/lover that they don’t want to see. Any conflict with your work, such as getting fired, gives your private correspondence to their lawyers.
Second, you need a wholly separate account for financial stuff, like Amazon.com, your bank, PayPal, and so on. That prevents confusion with phishing attacks.
Consider this warning today:
Phishing warning! Fake emails are being sent out pretending to be from the US Postal Service, claiming that you requested your mail be held this week. Don’t click on the attachment OR the links.
If you had split accounts, you could safely ignore this. The USPS would only know your financial email account, which gets no phishing attacks, because it’s not widely known. When your receive the phishing attack on your personal email, you ignore it, because you know the USPS doesn’t know your personal email account.
Phishing emails are so sophisticated that even experts can’t tell the difference. Splitting financial from personal emails makes it so you don’t have to tell the difference — anything financial sent to personal email can safely be ignored.
7. Deauth those apps!
Twitter user @tompcoleman comments that we also need deauth apps.
Social media sites like Facebook, Twitter, and Google encourage you to enable “apps” that work their platforms, often demanding privileges to generate messages on your behalf. The typical scenario is that you use them only once or twice and forget about them.
A lot of them are hostile. For example, my niece’s twitter account would occasional send out advertisements, and she didn’t know why. It’s because a long time ago, she enabled an app with the permission to send tweets for her. I had to sit down and get rid of most of her apps.
Now would be a good time to go through your relatives Facebook, Twitter, and Google/GMail and disable those apps. Don’t be a afraid to be ruthless — they probably weren’t using them anyway. Some will still be necessary. For example, Twitter for iPhone shows up in the list of Twitter apps. The URL for editing these apps for Twitter is https://twitter.com/settings/applications. Google link is here (thanks @spextr). I don’t know of simple URLs for Facebook, but you should find it somewhere under privacy/security settings.
You should install the latest OS (Windows 10, macOS High Sierra), and also turn on automatic patching.
But remember it may not be worth the huge effort involved. I want my parents to be secure — but no so secure I have to deal with issues.
For example, when my parents updated their HP Print software, the icon on the desktop my mom usually uses to scan things in from the printer disappeared, and needed me to spend 15 minutes with her helping find the new way to access the software.
However, I did get my mom a new netbook to travel with instead of the old WinXP one. I want to get her a Chromebook, but she doesn’t want one.
For iOS, you can probably make sure their phones have the latest version without having these usability problems.
You can’t solve every problem for your relatives, but these are the more critical ones.
Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging
Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.
However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.
If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.
What is event-driven computing?
Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.
Which AWS services publish events to SNS natively?
Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.
Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).
AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.
Elastic Load Balancing:Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.
Amazon S3:Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.
Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.
Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.
AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.
Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.
Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.
Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.
AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.
Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.
AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.
In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:
Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.
Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.
Over the years I’ve observed an inevitable shift from Eclipse to IntelliJ IDEA. Last year they were almost equal in usage, and I have the feeling things are swaying even more towards IDEA.
IDEA is like the iPhone of IDEs – its users tell you that “you will feel how much better it is once you get used to it”, “are you STILL using Eclipse??”, “IDEA is so much better, I thought everyone has switched”, etc.
I’ve been using mostly Eclipse for the past 12 years, but in some cases I did use IDEA – when I was writing Scala, when I was writing Android, and most recently – when Eclipse failed to be ready for the Java 9 release, so after half a day of trying to get it working, I just switched to IDEA until Eclipse finally gets a working Java 9 version (with Maven and the rest of the stuff).
But I will get back to Eclipse again, soon. And I still prefer it. Not just because of all the key combinations I’ve internalized (you can reuse those in IDEA), but because there are still things I find worse in IDEA. Of course, IDEA has so much more cool features like code improvement suggestions and actually working plugins for everything. But at least some of the problems I see have to do with the more basic development workflow and experience. And you can’t compensate for those with sugarcoating. So here they are:
Projects are not automatically built (by default), so you can end up with compilation errors that you don’t see until you open a non-compiling file or run a build. And turning the autobild on makes my machine crawl. I know I need an upgrade, but that’s not the point – not having “build on change” was a huge surprise to me the first time I tried IDEA. I recently complained about that on twitter and it turns out “it’s a feature”. The rationale seems to be that if you use refactoring, that shouldn’t happen. Well, there are dozens of cases when it does happen. Refactoring by adding a method parameter, by changing the type of a parameter, by removing a parameter (where the IDE can’t infer which parameter is removed based on the types), by changing return types. Also, a change in maven/gradle dependencies may introduces compilation issues that you don’t get to see. This is not a reasonable default at all, and I think the performance issues are the only reason it’s still the default. I think this makes the experience much worse.
You can have only one project per screen. Maybe there are those small companies with greenfield projects where you only need one. But I’ve never been in a situation, where you don’t at least occasionally need a separate project. Be it an “experiments” one, a “tools” one, or whatever. And no, multi-module maven projects (which IDEA handles well) are not sufficient. So each time you need to step out of your main project, you launch another screen. Apart from the bad usability, it’s double the memory, double the fun.
Speaking of memory, It seems to be taking more memory than Eclipse. I don’t have representative benchmarks of that, and I know that my 8 GB RAM home machine is way to small for development nowadays, but still.
It feels less responsive and clunky. There is some minor delay that I can’t define well, but “I feel it”. I read somewhere that they were excessively repainting the screen elements, so that might be the explanation. Eclipse feels smoother (I know that’s not a proper argument, but I can’t be more precise)
Due to some extra cleverness, I have “unused methods” and “never assigned fields” all around the project. It uses spring, so these methods and fields are controller methods and autowired fields. Maybe some spring plugin would take care of that, but spring is not the only framework that uses reflection. Even getters and setters on POJOs get the unused warnings. What’s the problem with those warnings? That warnings are devalued. They don’t mean anything now. There isn’t a “yellow” indicator on the class either, so you don’t actually see the amount of warnings you have. Eclipse displays warnings better, and the false positives are much less.
The call hierarchy is slightly worse. But since that’s the most important IDE feature for me (alongside refactoring), it matters. It doesn’t give you the call hierarchy of default constructors that are not explicitly defined. Also, from what I’ve seen IDEA users don’t often use the call hierarchy feature. “Find usage” I think predates the call hierarchy, and is also much more visible through the UI, so some of the IDEA users don’t even know what a call hierarchy is. And repeatedly do “find usage”. That’s only partly the IDE’s fault.
No search in the output console. Come one, why I do I have an IDE, where I have to copy the output and paste it in a text editor in order to search. Now, to clarify, the console does have search. But when I run my (spring-boot) application, it outputs stuff in a panel at the bottom that is not the console and doesn’t have search.
CTRL+arrows by default jumps over whole words, and not camel cased words. This is configurable, but is yet another odd default. You almost always want to be able to traverse your variables word by word (in camel case), rather than skipping over the whole variable (method/class) name.
A few years ago when I used it for Scala, the project never actually compiled. But I guess that’s more Scala’s fault than of the IDE
Apart from the first two, the rest are not major issues, I agree. But they add up. Ultimately, it’s a matter of personal choice whether you can turn a blind eye to these issues. But I’m getting back to Eclipse again. At some point I will propose improvements in the IntelliJ IDEA backlog and will check it again in a few years, I guess.
The recent addition of Xilinx FPGAs to AWS Cloud compute offerings is one way that AWS is enabling global growth in the areas of advanced analytics, deep learning and AI. The customized F1 servers use pooled accelerators, enabling interconnectivity of up to 8 FPGAs, each one including 64 GiB DDR4 ECC protected memory, with a dedicated PCIe x16 connection. That makes this a powerful engine with the capacity to process advanced analytical applications at scale, at a significantly faster rate. For example, AWS commercial partner Edico Genome is able to achieve an approximately 30X speedup in analyzing whole genome sequencing datasets using their DRAGEN platform powered with F1 instances.
While the availability of FPGA F1 compute on-demand provides clear accessibility and cost advantages, many mainstream users are still finding that the “threshold to entry” in developing or running FPGA-accelerated simulations is too high. Researchers at the UC Berkeley RISE Lab have developed “FireSim”, powered by Amazon FPGA F1 instances as an open-source resource, FireSim lowers that entry bar and makes it easier for everyone to leverage the power of an FPGA-accelerated compute environment. Whether you are part of a small start-up development team or working at a large datacenter scale, hardware-software co-design enables faster time-to-deployment, lower costs, and more predictable performance. We are excited to feature FireSim in this post from Sagar Karandikar and his colleagues at UC-Berkeley.
―Mia Champion, Sr. Data Scientist, AWS
Mapping an 8-node FireSim cluster simulation to Amazon EC2 F1
As traditional hardware scaling nears its end, the data centers of tomorrow are trending towards heterogeneity, employing custom hardware accelerators and increasingly high-performance interconnects. Prototyping new hardware at scale has traditionally been either extremely expensive, or very slow. In this post, I introduce FireSim, a new hardware simulation platform under development in the computer architecture research group at UC Berkeley that enables fast, scalable hardware simulation using Amazon EC2 F1 instances.
FireSim benefits both hardware and software developers working on new rack-scale systems: software developers can use the simulated nodes with new hardware features as they would use a real machine, while hardware developers have full control over the hardware being simulated and can run real software stacks while hardware is still under development. In conjunction with this post, we’re releasing the first public demo of FireSim, which lets you deploy your own 8-node simulated cluster on an F1 Instance and run benchmarks against it. This demo simulates a pre-built “vanilla” cluster, but demonstrates FireSim’s high performance and usability.
Why FireSim + F1?
FPGA-accelerated hardware simulation is by no means a new concept. However, previous attempts to use FPGAs for simulation have been fraught with usability, scalability, and cost issues. FireSim takes advantage of EC2 F1 and open-source hardware to address the traditional problems with FPGA-accelerated simulation: Problem #1: FPGA-based simulations have traditionally been expensive, difficult to deploy, and difficult to reproduce. FireSim uses public-cloud infrastructure like F1, which means no upfront cost to purchase and deploy FPGAs. Developers and researchers can distribute pre-built AMIs and AFIs, as in this public demo (more details later in this post), to make experiments easy to reproduce. FireSim also automates most of the work involved in deploying an FPGA simulation, essentially enabling one-click conversion from new RTL to deploying on an FPGA cluster.
Problem #2: FPGA-based simulations have traditionally been difficult (and expensive) to scale. Because FireSim uses F1, users can scale out experiments by spinning up additional EC2 instances, rather than spending hundreds of thousands of dollars on large FPGA clusters.
Problem #3: Finding open hardware to simulate has traditionally been difficult.Finding open hardware that can run real software stacks is even harder. FireSim simulates RocketChip, an open, silicon-proven, RISC-V-based processor platform, and adds peripherals like a NIC and disk device to build up a realistic system. Processors that implement RISC-V automatically support real operating systems (such as Linux) and even support applications like Apache and Memcached. We provide a custom Buildroot-based FireSim Linux distribution that runs on our simulated nodes and includes many popular developer tools.
Problem #4: Writing hardware in traditional HDLs is time-consuming. Both FireSim and RocketChip use the Chisel HDL, which brings modern programming paradigms to hardware description languages. Chisel greatly simplifies the process of building large, highly parameterized hardware components.
How to use FireSim for hardware/software co-design
FireSim drastically improves the process of co-designing hardware and software by acting as a push-button interface for collaboration between hardware developers and systems software developers. The following diagram describes the workflows that hardware and software developers use when working with FireSim.
Figure 2. The FireSim custom hardware development workflow.
The hardware developer’s view:
Write custom RTL for your accelerator, peripheral, or processor modification in a productive language like Chisel.
Run a software simulation of your hardware design in standard gate-level simulation tools for early-stage debugging.
Run FireSim build scripts, which automatically build your simulation, run it through the Vivado toolchain/AWS shell scripts, and publish an AFI.
Deploy your simulation on EC2 F1 using the generated simulation driver and AFI
Run real software builds released by software developers to benchmark your hardware
The software developer’s view:
Deploy the AMI/AFI generated by the hardware developer on an F1 instance to simulate a cluster of nodes (or scale out to many F1 nodes for larger simulated core-counts).
Connect using SSH into the simulated nodes in the cluster and boot the Linux distribution included with FireSim. This distribution is easy to customize, and already supports many standard software packages.
Directly prototype your software using the same exact interfaces that the software will see when deployed on the real future system you’re prototyping, with the same performance characteristics as observed from software, even at scale.
FireSim demo v1.0
Figure 3. Cluster topology simulated by FireSim demo v1.0.
This first public demo of FireSim focuses on the aforementioned “software-developer’s view” of the custom hardware development cycle. The demo simulates a cluster of 1 to 8 RocketChip-based nodes, interconnected by a functional network simulation. The simulated nodes work just like “real” machines: they boot Linux, you can connect to them using SSH, and you can run real applications on top. The nodes can see each other (and the EC2 F1 instance on which they’re deployed) on the network and communicate with one another. While the demo currently simulates a pre-built “vanilla” cluster, the entire hardware configuration of these simulated nodes can be modified after FireSim is open-sourced.
In this post, I walk through bringing up a single-node FireSim simulation for experienced EC2 F1 users. For more detailed instructions for new users and instructions for running a larger 8-node simulation, see FireSim Demo v1.0 on Amazon EC2 F1. Both demos walk you through setting up an instance from a demo AMI/AFI and booting Linux on the simulated nodes. The full demo instructions also walk you through an example workload, running Memcached on the simulated nodes, with YCSB as a load generator to demonstrate network functionality.
Deploying the demo on F1
In this release, we provide pre-built binaries for driving simulation from the host and a pre-built AFI that contains the FPGA infrastructure necessary to simulate a RocketChip-based node.
Starting your F1 instances
First, launch an instance using the free FireSim Demo v1.0 product available on the AWS Marketplace on an f1.2xlarge instance. After your instance has booted, log in using the user name centos. On the first login, you should see the message “FireSim network config completed.” This sets up the necessary tap interfaces and bridge on the EC2 instance to enable communicating with the simulated nodes.
The AMI contains a variety of tools to help you run simulations and build software for RISC-V systems, including the riscv64 toolchain, a Buildroot-based Linux distribution that runs on the simulated nodes, and the simulation driver program. For more details, see the AMI Contents section on the FireSim website.
First, you need to flash the FPGA with the FireSim AFI. To do so, run:
This automatically calls the simulation driver, telling it to load the Linux kernel image and root filesystem for the Linux distro. This produces output similar to the following:
Simulations Started. You can use the UART console of each simulated node by attaching to the following screens:
There is a screen on:
1 Socket in /var/run/screen/S-centos.
You could connect to the simulated UART console by connecting to this screen, but instead opt to use SSH to access the node instead.
First, ping the node to make sure it has come online. This is currently required because nodes may get stuck at Linux boot if the NIC does not receive any network traffic. For more information, see Troubleshooting/Errata. The node is always assigned the IP address 192.168.1.10:
This should eventually produce the following output:
PING 192.168.1.10 (192.168.1.10) 56(84) bytes of data.
From 192.168.1.1 icmp_seq=1 Destination Host Unreachable
64 bytes from 192.168.1.10: icmp_seq=1 ttl=64 time=2017 ms
64 bytes from 192.168.1.10: icmp_seq=2 ttl=64 time=1018 ms
64 bytes from 192.168.1.10: icmp_seq=3 ttl=64 time=19.0 ms
At this point, you know that the simulated node is online. You can connect to it using SSH with the user name root and password firesim. It is also convenient to make sure that your TERM variable is set correctly. In this case, the simulation expects TERM=linux, so provide that:
At this point, you’re connected to the simulated node. Run uname -a as an example. You should see the following output, indicating that you’re connected to a RISC-V system:
# uname -a
Linux buildroot 4.12.0-rc2 #1 Fri Aug 4 03:44:55 UTC 2017 riscv64 GNU/Linux
Now you can run programs on the simulated node, as you would with a real machine. For an example workload (running YCSB against Memcached on the simulated node) or to run a larger 8-node simulation, see the full FireSim Demo v1.0 on Amazon EC2 F1 demo instructions.
Finally, when you are finished, you can shut down the simulated node by running the following command from within the simulated node:
You can confirm that the simulation has ended by running screen -ls, which should now report that there are no detached screens.
At Berkeley, we’re planning to keep improving the FireSim platform to enable our own research in future data center architectures, like FireBox. The FireSim platform will eventually support more sophisticated processors, custom accelerators (such as Hwacha), network models, and peripherals, in addition to scaling to larger numbers of FPGAs. In the future, we’ll open source the entire platform, including Midas, the tool used to transform RTL into FPGA simulators, allowing users to modify any part of the hardware/software stack. Follow @firesimproject on Twitter to stay tuned to future FireSim updates.
FireSim is the joint work of many students and faculty at Berkeley: Sagar Karandikar, Donggyu Kim, Howard Mao, David Biancolin, Jack Koenig, Jonathan Bachrach, and Krste Asanović. This work is partially funded by AWS through the RISE Lab, by the Intel Science and Technology Center for Agile HW Design, and by ASPIRE Lab sponsors and affiliates Intel, Google, HPE, Huawei, NVIDIA, and SK hynix.
Now that you can reserve seating in AWS re:Invent 2017 breakout sessions, workshops, chalk talks, and other events, the time is right to review the list of introductory, advanced, and expert content being offered this year. To learn more about breakout content types and levels, see Breakout Content.
SID202 – Deep dive about how Capital One automates the delivery of directory services across AWS accounts Traditional solutions for using Microsoft Active Directory across on-premises and AWS Cloud Windows workloads can require complex networking or syncing identities across multiple systems. AWS Directory Service for Microsoft Active Directory, also known as AWS Managed AD, offers you actual Microsoft Active Directory in the AWS Cloud as a managed service. In this session, you will learn how Capital One uses AWS Managed AD to provide highly available authentication and authorization services for its Windows workloads, such as Amazon RDS for SQL Server.
SID205 – Building the Largest Repo for Serverless Compliance-as-Code When you use the cloud to enable speed and agility, how do you know if you’ve done it correctly? We are on a mission to help builders follow industry best practices within security guardrails by creating the largest compliance-as-code repository, available to all. Compliance-as-code is the idea to translate best practices, guardrails, policies, and standards into codified unit testing. Apply this to your AWS environment to provide insights about what can or must be improved. Learn why compliance-as-code matters to gain speed (by getting developers, architects, and security pros on the same page), how it is currently used (demo), and how to start using it or being part of building it.
SID206 – Best Practices for Managing Security Operation on AWS To help prevent unexpected access to your AWS resources, it is critical to maintain strong identity and access policies and track, detect, and react to changes. In this session, you will learn how to use AWS Identity and Access Management (IAM) to control access to AWS resources and integrate your existing authentication system with IAM. We will cover how to deploy and control AWS infrastructure using code templates, including change management policies with AWS CloudFormation.
SID207 – Feedback Security in the Cloud Like many security teams, Riot has been challenged by new paradigms that came with the move to the cloud. We discuss how our security team has developed a security culture based on feedback and self-service to best thrive in the cloud. We detail how the team assessed the security gaps and challenges in our move to AWS, and then describe how the team works within Riot’s unique feedback culture.
SID208 – Less (Privilege) Is More: Getting Least Privilege Right in AWS AWS services are designed to enable control through AWS Identity and Access Management (IAM) and Amazon Virtual Private Cloud (VPC). Join us in this chalk talk to learn how to apply these toward the security principal of least privilege for applications and data and how to practically integrate them in your security operations.
SID209 – Designing and Deploying an AWS Account Factory AWS customers start off with one AWS account, but quickly realize the benefits of having multiple AWS accounts. A common learning curve for customers is how to securely baseline and set up new accounts at scale. This talk helps you understand how to use AWS Organizations, AWS Identity and Access Management (IAM), AWS CloudFormation, and other tools to baseline new accounts, set them up for federation, and make a secure and repeatable account factory to create new AWS accounts. Walk away with demos and tools to use in your own environment.
SID210 – A CISO’s Journey at Vonage: Achieving Unified Security at Scale Making sense of the risks of IT deployments that sit in hybrid environments and span multiple countries is a major challenge. When you add in multiple toolsets and global compliance requirements, including GDPR, it can get overwhelming. Listen to Vonage’s Chief Information Security Officer, Johan Hybinette, share his experiences tackling these challenges.
SID212 – Maximizing Your Move to AWS – Five Key Lessons from Vanguard and Cloud Technology Partners CTP’s Robert Christiansen and Mike Kavis describe how to maximize the value of your AWS initiative. From building a Minimum Viable Cloud to establishing a cloud robust security and compliance posture, we walk through key client success stories and lessons learned. We also explore how CTP has helped Vanguard, the leading provider of investor communications and technology, take advantage of AWS to delight customers, drive new revenue streams, and transform their business.
SID213 – Managing Regulator Expectations – Lessons Learned on Positioning AWS Services from an Audit Perspective Cloud migration in highly regulated industries can stall without a solid understanding of how (and when) to address regulatory expectations. This session provides a guide to explaining the aspects of AWS services that are most frequently the subject of an internal or regulatory audit. Because regulatory agencies and internal auditors might not share a common understanding of the cloud, this session is designed to help you to help them, regardless of their level of technical fluency.
SID214 – Best Security Practices in the Intelligence Community Executives from the Intelligence community discuss cloud security best practices in a field where security is imperative to operations. CIA security cloud chief John Nicely and NGA security cloud chief Scot Kaplan share success stories of migrating mass data to the cloud from a security perspective. Hear how they migrated their IT portfolios while managing their organizations’ unique blend of constraints, budget issues, politics, culture, and security pressures. Learn how these institutions overcame barriers to migration, and ask these panelists what actions you can take to better prepare yourself for the journey of mass migration to the cloud.
SID216 – Defending Diverse Applications Against Common Threats In this session, you learn how to adapt application defenses and operational responses based on your unique requirements. You also hear directly from customers about how they architected their applications on AWS to protect their applications. There are many ways to build secure, high-availability applications in the cloud. Services such as Amazon API Gateway, Amazon VPC, ALB, ELB, and Amazon EC2 are the basic building blocks that enable you to address a wide range of use cases. Best practices for defending your applications against Distributed Denial of Service (DDoS) attacks, exploitation attempts, and bad bots can vary with your choices in architecture.
SID301 – Using AWS Lambda as a Security Team Operating a security practice on AWS brings many new challenges that haven’t been faced in data center environments. The dynamic nature of infrastructure, the relationship between development team members and their applications, and the architecture paradigms have all changed as a result of building software on top of AWS. In this session, learn how your security team can leverage AWS Lambda as a tool to monitor, audit, and enforce your security policies within an AWS environment.
SID302 – Force Multiply Your Security Team with Automation and Alexa Adversaries automate. Who says the good guys can’t as well? By combining AWS offerings like AWS CloudTrail, Amazon Cloudwatch, AWS Config, and AWS Lambda with the power of Amazon Alexa, you can do more security tasks faster, with fewer resources. Force multiplying your security team is all about automation! Last year, we showed off penetration testing at the push of an (AWS IoT) button, and surprise-previewed how to ask Alexa to run Inspector as-needed. Want to see other ways to ask Alexa to be your cloud security sidekick? We have crazy new demos at the ready to show security geeks how to sling security automation solutions for their AWS environments (and impress and help your boss, too).
SID303 – How You Can Use AWS’s Identity Services to be Successful on Your AWS Cloud Journey Every journey to the AWS Cloud is unique. Some customers are migrating existing applications, while others are building new applications using cloud-native services. Along each of these journeys, identity and access management helps customers protect their applications and resources. In this session, you will learn how AWS’s identity services provide you a secure, flexible, and easy solution for managing identities and access on the AWS Cloud. With AWS’’s Identity Services, you do not have to adapt to AWS. Instead, you have a choice of services designed to meet you anywhere along your journey to the AWS Cloud. Every journey to the AWS Cloud is unique. Some customers are migrating existing applications, while others are building new applications using cloud-native services.
SID304 – SecOps 2021 Today: Using AWS Services to Deliver SecOps This talk dives deep on how to build end-to-end security capabilities using AWS. Our goal is orchestrating AWS Security services with other AWS building blocks to deliver enhanced security. We cover working with AWS CloudWatch Events as a queueing mechanism for processing security events, using Amazon DynamoDB to provide a stateful layer to provide tailored response to events and other ancillary functions, using DynamoDB as an attack signature engine, and the use of analytics to derive tailored signatures for detection with AWS Lambda.
SID306 – How Chick-fil-A Embraces DevSecOps on AWS As Chick-fil-A became a cloud-first organization, their security team didn’t want to become the bottleneck for agility. But the security team also wanted to raise the bar for their security posture on AWS. Robert Davis, security architect at Chick-fil-A, provides an overview about how he and his team recognized that writing code was the best way for their security policies to scale across the many AWS accounts that Chick-fil-A operates.
SID307 – Serverless for Security Officers: Paradigm Walkthrough and Comprehensive Security Best Practices For security practitioners, serverless represents a context switch from the familiar servers and networks to a decentralized set of code snippets and AWS platform constructs. This new ecosystem also represents new operational teams, data flows, security tooling, and faster-then-ever change velocity. In this talk, we perform live demos and provide code samples for a wide array of security best practices aligned to industry standards such as NIST 800-53 and ISO 27001.
SID308 – Multi-Account Strategies We will explore a multi-account architecture and how to approach the design/thought process around it. This chalk talk will allow attendees to dive deep into the topic and discuss the nuances of the architecture as well as provide feedback around the approach.
SID309 – Credentials, Credentials, Credentials, Oh My! For new and experienced customers alike, understanding the various credential forms and exchange mechanisms within AWS can be a daunting exercise. In this chalk talk, we clear up the confusion by performing a cartography exercise. We visually depict the right source credentials (for example, enterprise user name and password, IAM keys, AWS STS tokens, and so on) and transformation mechanisms (for example, AssumeRole and so on) to use depending on what you’re trying to do and where you’re coming from.
SID310 – Moving from the Shadows to the Throne What do you do when leadership embraces what was called “shadow IT” as the new path forward? How do you onboard new accounts while simultaneously pushing policy to secure all existing accounts? This session walks through Cisco’s journey consolidating over 700 existing accounts in the Cisco organization, while building and applying Cisco’s new cloud policies.
SID311 – Designing Security and Governance Across a Multi-Account Strategy When organizations plan their journey to cloud adoption at scale, they quickly encounter questions such as: How many accounts do we need? How do we share resources? How do we integrate with existing identity solutions? In this workshop, we present best practices and give you the hands-on opportunity to test and develop best practices. You will work in teams to set up and create an AWS environment that is enterprise-ready for application deployment and integration into existing operations, security, and procurement processes. You will get hands-on experience with cross-account roles, consolidated logging, account governance and other challenges to solve.
SID312 – DevSecOps Capture the Flag In this Capture the Flag workshop, we divide groups into teams and work on AWS CloudFormation DevSecOps. The AWS Red Team supplies an AWS DevSecOps Policy that needs to be enforced via CloudFormation static analysis. Participant Blue Teams are provided with an AWS Lambda-based reference architecture to be used to inspect CloudFormation templates against that policy. Interesting items need to be logged, and made visible via ChatOps. Dangerous items need to be logged, and recorded accurately as a template fail. The secondary challenge is building a CloudFormation template to thwart the controls being created by the other Blue teams.
SID313 – Continuous Compliance on AWS at Scale In cloud migrations, the cloud’s elastic nature is often touted as a critical capability in delivering on key business initiatives. However, you must account for it in your security and compliance plans or face some real challenges. Always counting on a virtual host to be running, for example, causes issues when that host is rebooted or retired. Managing security and compliance in the cloud is continuous, requiring forethought and automation. Learn how a leading, next generation managed cloud provider uses automation and cloud expertise to manage security and compliance at scale in an ever-changing environment.
SID314 – IAM Policy Ninja Are you interested in learning how to control access to your AWS resources? Have you wondered how to best scope permissions to achieve least-privilege permissions access control? If your answer is “yes,” this session is for you. We look at the AWS Identity and Access Management (IAM) policy language, starting with the basics of the policy language and how to create and attach policies to IAM users, groups, and roles. We explore policy variables, conditions, and tools to help you author least privilege policies. We cover common use cases, such as granting a user secure access to an Amazon S3 bucket or to launch an Amazon EC2 instance of a specific type.
SID315 – Security and DevOps: Agility and Teamwork In this session, you learn pragmatic steps to integrate security controls into DevOps processes in your AWS environment at scale. Cybersecurity expert and founder of Alert Logic Misha Govshteyn shares insights from high performing teams who are embracing the reality that an agile security program can enable faster and more secure workload deployments. Joining Misha is Joey Peloquin, Director of Cloud Security Operations at Citrix, who discusses Citrix’s DevOps experiences and how they manage their cybersecurity posture within the AWS Cloud. Session sponsored by Alert Logic.
SID316 – Using Access Advisor to Strike the Balance Between Security and Usability AWS provides a killer feature for security operations teams: Access Advisor. In this session, we discuss how Access Advisor shows the services to which an IAM policy grants access and provides a timestamp for the last time that the role authenticated against that service. At Netflix, we use this valuable data to automatically remove permissions that are no longer used. By continually removing excess permissions, we can achieve a balance of empowering developers and maintaining a best-practice, secure environment.
SID317 – Automating Security and Compliance Testing of Infrastructure-as-Code for DevSecOps Infrastructure-as-Code (IaC) has emerged as an essential element of organizational DevOps practices. Tools such as AWS CloudFormation and Terraform allow software-defined infrastructure to be deployed quickly and repeatably to AWS. But the agility of CI/CD pipelines also creates new challenges in infrastructure security hardening. This session provides a foundation for how to bring proven software hardening practices into the world of infrastructure deployment. We discuss how to build security and compliance tests for infrastructure analogous to unit tests for application code, and showcase how security, compliance and governance testing fit in a modern CI/CD pipeline.
SID318 – From Obstacle to Advantage: The Changing Role of Security & Compliance in Your Organization A surprising trend is starting to emerge among organizations who are progressing through the cloud maturity lifecycle: major improvements in revenue growth, customer satisfaction, and mission success are being directly attributed to improvements in security and compliance. At one time thought of as speed bumps in the path to deployment, security and compliance are now seen as critical ingredients that help organizations differentiate their offerings in the market, win more deals, and achieve mission-critical goals faster. This session explores how organizations like Jive Software and the National Geospatial Agency use the Evident Security Platform, AWS, and AWS Quick Starts to automate security and compliance processes in their organization to accomplish more, do it faster, and deliver better results.
SID319 – Incident Response in the Cloud In this session, we walk you through a hypothetical incident response managed on AWS. Learn how to apply existing best practices as well as how to leverage the unique security visibility, control, and automation that AWS provides. We cover how to set up your AWS environment to prevent a security event and how to build a cloud-specific incident response plan so that your organization is prepared before a security event occurs. This session also covers specific environment recovery steps available on AWS.
SID320 – Fraud Prevention, Detection, Lessons Learned, and Best Practices Fighting fraud means countering human actors that quickly adapt to whatever you do to stop them. In this presentation, we discuss the key components of a fraud prevention program in the cloud. Additionally, we provide techniques for detecting known and unknown fraud activity and explore different strategies for effectively preventing detected patterns. Finally, we discuss lessons learned from our own prevention activities as well as the best practices that you can apply to manage risk.
SID322 – The AWS Philosophy of Security AWS distinguished engineer Eric Brandwine speaks with hundreds of customers each year, and noticed one question coming up more than any other, “How does AWS operationalize its own security?” In this session, Eric details both strategic and tactical considerations, along with an insider’s look at AWS tooling and processes.
SID324 – Automating DDoS Response in the Cloud If left unmitigated, Distributed Denial of Service (DDoS) attacks have the potential to harm application availability or impair application performance. DDoS attacks can also act as a smoke screen for intrusion attempts or as a harbinger for attacks against non-cloud infrastructure. Accordingly, it’s crucial that developers architect for DDoS resiliency and maintain robust operational capabilities that allow for rapid detection and engagement during high-severity events. In this session, you learn how to build a DDoS-resilient application and how to use services like AWS Shield and Amazon CloudWatch to defend against DDoS attacks and automate response to attacks in progress.
SID325 – Amazon Macie: Data Visibility Powered by Machine Learning for Security and Compliance Workloads In this session, Edmunds discusses how they create workflows to manage their regulated workloads with Amazon Macie, a newly-released security and compliance management service that leverages machine learning to classify your sensitive data and business-critical information. Amazon Macie uses recurrent neural networks (RNN) to identify and alert potential misuse of intellectual property. They do a deep dive into machine learning within the security ecosystem.
SID326 – AWS Security State of the Union Steve Schmidt, chief information security officer at AWS, addresses the current state of security in the cloud, with a particular focus on feature updates, the AWS internal “secret sauce,” and what’s on horizon in terms of security, identity, and compliance tooling.
SID327 – How Zocdoc Achieved Security and Compliance at Scale With Infrastructure as Code In less than 12 months, Zocdoc became a cloud-first organization, diversifying their tech stack and liberating data to help drive rapid product innovation. Brian Lozada, CISO at Zocdoc, and Zhen Wang, Director of Engineering, provide an overview on how their teams recognized that infrastructure as code was the most effective approach for their security policies to scale across their AWS infrastructure. They leveraged tools such as AWS CloudFormation, hardened AMIs, and hardened containers. The use of DevSecOps within Zocdoc has enhanced data protection with the use of AWS services such as AWS KMS and AWS CloudHSM and auditing capabilities, and event-based policy enforcement with Amazon Elasticsearch Service and Amazon CloudWatch, all built on top of AWS.
SID328 – Cloud Adoption in Regulated Financial Services Macquarie, a global provider of financial services, identified early on that it would require strong partnership between its business, technology and risk teams to enable the rapid adoption of AWS cloud technologies. As a result, Macquarie built a Cloud Governance Platform to enable its risk functions to move as quickly as its development teams. This platform has been the backbone of Macquarie’s adoption of AWS over the past two years and has enabled Macquarie to accelerate its use of cloud technologies for the benefit of clients across multiple global markets. This talk will outline the strategy that Macquarie embarked on, describe the platform they built, and provide examples of other organizations who are on a similar journey.
SID329 – A Deep Dive into AWS Encryption Services AWS Encryption Services provide an easy and cost-effective way to protect your data in AWS. In this session, you learn about leveraging the latest encryption management features to minimize risk for your data.
SID330 – Best Practices for Implementing Your Encryption Strategy Using AWS Key Management Service AWS Key Management Service (KMS) is a managed service that makes it easy for you to create and manage the encryption keys used to encrypt your data. In this session, we will dive deep into best practices learned by implementing AWS KMS at AWS’s largest enterprise clients. We will review the different capabilities described in the AWS Cloud Adoption Framework (CAF) Security Perspective and how to implement these recommendations using AWS KMS. In addition to sharing recommendations, we will also provide examples that will help you protect sensitive information on the AWS Cloud.
SID331 – Architecting Security and Governance Across a Multi-Account Strategy Whether it is per business unit or per application, many AWS customers use multiple accounts to meet their infrastructure isolation, separation of duties, and billing requirements. In this session, we discuss considerations, limitations, and security patterns when building out a multi-account strategy. We explore topics such as identity federation, cross-account roles, consolidated logging, and account governance. Thomson Reuters shared their journey and their approach to a multi-account strategy. At the end of the session, we present an enterprise-ready, multi-account architecture that you can start leveraging today.
SID332 – Identity Management for Your Users and Apps: A Deep Dive on Amazon Cognito Learn how to set up an end-user directory, secure sign-up and sign-in, manage user profiles, authenticate and authorize your APIs, federate from enterprise and social identity providers, and use OAuth to integrate with your app—all without any server setup or code. With clear blueprints, we show you how to leverage Amazon Cognito to administer and secure your end users and enable identity for the applied patterns of mobile, web, and enterprise apps.
SID335 – Implementing Security and Governance across a Multi-Account Strategy As existing or new organizations expand their AWS footprint, managing multiple accounts while maintaining security quickly becomes a challenge. In this chalk talk, we will demonstrate how AWS Organizations, IAM roles, identity federation, and cross-account manager can be combined to build a scalable multi-account management platform. By the end of this session, attendees will have the understanding and deployment patterns to bring a secure, flexible and automated multi-account management platform to their organizations.
SID336 – Use AWS to Effectively Manage GDPR Compliance The General Data Protection Regulation (GDPR) is considered to be the most stringent privacy regulation ever enacted. Complying with GDPR could be a challenge for organizations, and AWS services can help get you ahead of the May 2018 enforcement deadline. In this chalk talk, the Legal and Compliance GDPR leadership at AWS discusses what enforcement of GDPR might mean for you and your customer’s compliance programs.
SID337 – Best Practices for Managing Access to AWS Resources Using IAM Roles In this chalk talk, we discuss why using temporary security credentials to manage access to your AWS resources is an AWS Identity and Access Management (IAM) best practice. IAM roles help you follow this best practice by delivering and rotating temporary credentials automatically. We discuss the different types of IAM roles, the assume role functionality, and how to author fine-grained trust and access policies that limit the scope of IAM roles. We then show you how to attach IAM roles to your AWS resources, such as Amazon EC2 instances and AWS Lambda functions. We also discuss migrating applications that use long-term AWS access keys to temporary credentials managed by IAM roles.
SID338 – [email protected] Once a customer achieves success with using AWS in a few pilot projects, most look to rapidly adopt an “all-in” enterprise migration strategy. Along this journey, several new challenges emerge that quickly become blockers and slow down migrations if they are not addressed properly. At this scale, customers will deal with the governance of hundreds of accounts, as well as thousands of IT resources residing within those accounts. Humans and traditional IT management processes cannot scale at the same pace and inevitably challenging questions emerge. In this session, we discuss those questions about governance at scale.
SID339 – Deep Dive on AWS CloudHSM Organizations building applications that handle confidential or sensitive data are subject to many types of regulatory requirements and often rely on hardware security modules (HSMs) to provide validated control of encryption keys and cryptographic operations. AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys on the AWS Cloud using FIPS 140-2 Level 3 validated HSMs. This chalk talk will provide you a deep-dive on CloudHSM, and demonstrate how you can quickly and easily use CloudHSM to help secure your data and meet your compliance requirements.
SID340 – Using Infrastructure as Code to Inject Security Best Practices as Part of the Software Deployment Lifecycle A proactive approach to security is key to securing your applications as part of software deployment. In this chalk talk, T. Rowe Price, a financial asset management institution, outlines how they built their security automation process in enabling their numerous developer teams to rapidly and securely build and deploy applications at scale on AWS. Learn how they use services like AWS Identity and Access Management (IAM), HashiCorp tools, Terraform for automation, and Vault for secrets management, and incorporate certificate management and monitoring as part of the deployment process. T. Rowe Price discusses lessons learned and best practices to move from a tightly controlled legacy environment to an agile, automated software development process on AWS.
SID341 – Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection This workshop gives you an opportunity to develop a solution that can continuously monitor for and detect a realistic threat by analyzing AWS CloudTrail log data. Participants are provided with a CloudTrail data source and some clues to get started. Then you have to design a system that can process the logs, detect the threat, and trigger an alarm. You can make use of any AWS services that can assist in this endeavor, such as AWS Lambda for serverless detection logic, Amazon CloudWatch or Amazon SNS for alarming and notification, Amazon S3 for data and configuration storage, and more.
SID342 – Protect Your Web Applications from Common Attack Vectors Using AWS WAF As attacks and attempts to exploit vulnerabilities in web applications become more sophisticated, having an effective web request filtering solution becomes key to keeping your users’ data safe. In this workshop, discover how the OWASP Top 10 list of application security risks can help you secure your web applications. Learn how to use AWS services, such as AWS WAF, to mitigate vulnerabilities. This session includes hands-on labs to help you build a solution. Key learning goals include understanding the breadth and complexity of vulnerabilities customers need to protect from, understanding the AWS tools and capabilities that can help mitigate vulnerabilities, and learning how to configure effective HTTP request filtering rules using AWS WAF.
SID343 – User Management and App Authentication with Amazon Cognito Are you curious about how to authenticate and authorize your applications on AWS? Have you thought about how to integrate AWS Identity and Access Management (IAM) with your app authentication? Have you tried to integrate third-party SAML providers with your app authentication? Look no further. This workshop walks you through step by step to configure and create Amazon Cognito user pools and identity pools. This workshop presents you with the framework to build an application using Java, .NET, and serverless. You choose the stack and build the app with local users. See the service being used not only with mobile applications but with other stacks that normally don’t include Amazon Cognito.
SID344 – Soup to Nuts: Identity Federation for AWS AWS offers customers multiple solutions for federating identities on the AWS Cloud. In this session, we will embark on a tour of these solutions and the use cases they support. Along the way, we will dive deep with demonstrations and best practices to help you be successful managing identities on the AWS Cloud. We will cover how and when to use Security Assertion Markup Language 2.0 (SAML), OpenID Connect (OIDC), and other AWS native federation mechanisms. You will learn how these solutions enable federated access to the AWS Management Console, APIs, and CLI, AWS Infrastructure and Managed Services, your web and mobile applications running on the AWS Cloud, and much more.
SID345 – AWS Encryption SDK: The Busy Engineer’s Guide to Client-Side Encryption You know you want client-side encryption for your service but you don’t know exactly where to start. Join us for a hands-on workshop where we review some of your client-side encryption options and explore implementing client-side encryption using the AWS Encryption SDK. In this session, we cover the basics of client-side encryption, perform encrypt and decrypt operations using AWS KMS and the AWS Encryption SDK, and discuss security and performance considerations when implementing client-side encryption in your service.
SID401 – Let’s Dive Deep Together: Advancing Web Application Security Beginning with a recap of best practices in CloudFront, AWS WAF, Route 53, and Amazon VPC security, we break into small teams to work together on improving the security of a typical web application. How can we creatively use the services? What additional features would help us? This technically advanced chalk talk requires certification at the solutions architect associate level or greater.
SID402 – An AWS Security Odyssey: Implementing Security Controls in the World of Internet, Big Data, IoT and E-Commerce Platforms This workshop will give participants the opportunity to take a security-focused journey across various AWS services and implement automated controls along the way. You will learn how to apply AWS security controls to services such as Amazon EC2, Amazon S3, AWS Lambda, and Amazon VPC. In short, you will learn how to use the cloud to protect the cloud. We will talk about how to: Adopt a workload-centric approach to your security strategy, Address security issues in a cost-effective manner Automate your security responses to promote maturity and auditability. In order to complete this workshop, attendees will need a laptop with wireless access, an AWS account and an IAM user that has full administrative privileges within their account. AWS credits will be provided as attendees depart the session to cover the cost of running the workshop in their own account.
SID404 – Amazon Inspector – Automating the “Sec” in DevSecOps Adopting DevSecOps can be challenging using traditional security tools that are designed for on-premises infrastructure. Amazon Inspector is an automated security assessment service that helps you adopt DevSecOps by integrating security assessments directly into the development process of applications running on Amazon EC2. We dive deep on how to use Inspector to automate host security assessments. We show you how to integrate Inspector with other AWS Cloud services to provide automated security assessments throughout your development process. We demo installing the AWS agent, setting up assessment targets and templates, and running assessments. We review the findings and discuss how you can automate the management and remediation of those findings with your available AWS services.
SID405 – Five New Security Automation Improvements You Can Make by Using Amazon CloudWatch Events and AWS Config Rules This presentation will include a deep dive into the code behind multiple security automation and remediation functions. This session will consider potential use cases, as well as feature a demonstration of a proposed script, and then walk through the code set to explain the various challenges and solutions of the intended script. All examples of code will be previously unreleased and will feature integration with services such as Trusted Advisor and Macie. All code will be released as OSS after re:Invent.
For those who aren’t cryptocurrency-savvy: Ethereum is a cryptocurrency project, based around the coin Ether. It has the support of many big banks, big hedge funds and some states (Russia, China etc). Among the cryptocurrencies, it is second only to Bitcoin – and might even overtake it with the time. (Especially if Bitcoin doesn’t finally move and fix some of its problems.)
Ethereum offers some abilities that few other cryptocurrencies do. The most important one is the support for “smart projects” – kind of electronic contracts that can easily be executed and enforced with little to no human participation. This post however is dedicated to another of its traits – the Proof of Stake.
To work and exist, every cryptocurrency depends on some proof. Most of them use Proof-of-Work scheme. In it, one has to put some work – eg. calculating checksums – behind its participation in the network and its decision, and receive newly generated coins for it. This however results in huge amount of work done only to prove that, well, you can do it and deserve to be in and receive some of the newly squeezed juice.
As of August 2017, Ethereum uses this scheme too. However, they plan to switch to a Proof-of-Stake algorithm named Casper. In it, you prove yourself not by doing work, but by proving to own Ether. As this requires practically no work, it is much more technically effective than the Proof-of-Work schemes.
Technically, Caspar is an amazing design. I congratulate the Ethereum team for it. However, economically its usage appears to have an important weakness. It is described below.
A polarized system
With Casper, the Ether generated by the Ethereum network and the decision power in it are distributed to these who already own Ether. As a consequence, most of both go to those who own most Ether. (There might be attempts to limit that, but these are easily defeatable. For example, limiting the amount distributed to an address can be circumvented by a Sybil attack.)
Such a distribution will create with the time a financial ecosystem where most money and vote are held by a small minority of the participants. The big majority will have little to no of both – it will summarily hold less money and vote than the minority of “haves”. Giving the speed with which the cryptocurrency systems evolve, it is realistic to expect this development in ten, maybe even in five or less years after introducing Casper.
The “middle class”
Economists love to repeat how important is to have a strong middle class. Why, and how that translates to the situation in a cryptocurrency-based financial system?
In systemic terms, “middle class” denotes in a financial system the set of entities that control each a noticeable but not very big amount of resources.
Game theory shows that in a financial system, entities with different clout usually have different interests. These interests usually reflect the amount of resources they control. Entities with little to no resources tend to have interests opposing to these with biggest resources – especially in systems where the total amount of resources changes slowly and the economics is close to a zero-sum game. (For example, in most cryptocurrency systems.) The “middle class” entities interests are in most aspects in the middle.
For an economics to work, there must be a balance of interests that creates incentive for all of its members to participate. In financial systems, where “haves” interests are mostly opposing to “have-nots” interests, creating such a balance depends on the presence and influence of a “middle class”. Its interests are usually the closest to a compromise that satisfies all, and its influence is the key to achieving that compromise within the system.
If the system state is not acceptable for all entities, these who do not accept it eventually leave. (Usually their participation is required for the system survival, so this brings the system down.) If these entities cannot leave the system, they ultimately reject its rules and try to change it by force. If that is impossible too, they usually resort to denying the system what makes them useful for it, thus decreasing its competitiveness to other systems.
The most reliable way to have acceptable compromise enforced in a system is to have in it a “middle class” that summarily controls more resources than any other segment of entities, preferably at least 51% of the system resources. (This assumes that the “middle class” is able and willing to protect their interests. If some of these entities are controlled into defending someone else’s interests – eg. botnets in computer networks, manipulated voters during elections, etc – these numbers apply to the non-controlled among them.)
A system that doesn’t have a non-controlled “middle class” that controls a decisive amount of resources, usually does not have an influential set of interests that are an acceptable compromise between the interests poles. For this reason, it can be called a polarized system.
The limitation on development
In a polarized system, the incentive for development is minimized. (Development is potentially disruptive, and the majority of the financial abilities and the decision power there has only to lose from a disruption. When factoring in the expected profits from development, the situation always becomes a zero-sum game.) The system becomes static (thus cementing the zero-sum game situation in it) and is under threat of being overtaken by a competing financial system. When that happens, it is usually destroyed together with all stakes in it.
Also, almost any initiative in such a financial system is bound to turn into a cartel, oligopoly or monopoly, due to the small number of participants with resources to start and support an initiative. That effectively destroys its markets, contributing to the weakness of the system and limiting further its ability to develop.
Another problem that stems from this is that the incentive during an interaction to violate the rules and to push the contragent into a loss is greater than the incentive to compete by giving a better offer. This in turn removes the incentive to increase productivity, which is a key incentive for development.)
Yet another problem of the concentration of most resources into few entities is the increased gain from attacking one of them and appropriating their resources, and thus the incentive to do it. Since good defensive capabilities are usually an excellent offense base, this pulls the “haves” into an “arms race”, redirecting more and more of their resources into defense. This also leaves the development outside the arms race increasingly resource-strapped. (The “arms race” itself generates development, but the race situation prevents that into trickling into “non-military” applications.)
These are only a part of the constraints on development in a polarized system. Listing all of them will make a long read.
Trickle-up and trickle-down
In theory, every economical system involves two processes: trickle-down and trickle-up. So, any concentration of resources on the top should be decreased by an automatically increased trickle-down. However, a better understanding how these processes work shows that this logic is faulty.
Any financial exchange in a system consists of two parts. One of them covers the actual production cost of whatever resource is being exchanged against the finances. The other part is the profit of the entity that obtains the finances. From the viewpoint of that entity, the first part vs. the resource given is zero-sum – its incentive to participate in this exchange is the second part, the profit. That second part is effectively the trickle in the system, as it is the only resource really gained.
The direction and the size of the trickle ultimately depends on the balance of many factors, some of them random, others constant. On the long run, it is the constant factors that determine the size and the direction of the trickle sum.
The most important constant factor is the benefit of scale (BOS). It dictates that the bigger entities are able to pull the balance to their side more strongly than the smaller ones. Some miss that chance, but others use it. It makes the trickle-up stronger than the trickle-down. In a system where the transaction outcome is close to a zero-sum game, this concentrates all resources at the top with a speed depending on the financial interactions volume per an unit of time.
(Actually the formula is a bit more complex. All dynamic entities – eg. living organisms, active companies etc – have an “existence maintenance” expense, which they cannot avoid. However, the amount of resources in a system above the summary existence maintenance follows the simple rule above. And these are the only resources that are available for investing in anything, eg. development.)
In the real-life systems the BOS power is limited. There are many different random factors that compete with and influence one another, some of them outweighing BOS. Also, in every moment some factors lose importance and / or cease to exist, while others appear and / or gain importance. The complexity of this system makes any attempt by an entity or entities pool to take control over it hard and slow. This gives the other entities time and ways to react and try to block the takeover attempt. Also, the real-life systems have many built-in constraints against scale-based takeovers – anti-trust laws, separation of the government powers, enforced financial trickle-down through taxes on the rich and benefits for the poor, etc. All these together manage to prevent most takeover attempts, or to limit them into only a segment of the system.
How a Proof-of-Stake based cryptocurrency fares at these?
A POS-based cryptocurrency financial system has no constraints against scale-based takeovers. It has only one kind of clout – the amount of resources controlled by an entity. This kind of clout is built in it, has all the importance in it and cannot lose that or disappear. It has no other types of resources, and has no slowing due to complexity. It is not segmented – who has these resources has it all. There are no built-in constraints against scale-based takeovers, or mechanisms to strengthen resource trickle-down. In short, it is the ideal ground for creating a polarized financial system.
So, it would be only logical to expect that a Proof-of-Stake based Ether financial system will suffer by the problems a polarized system presents. Despite all of its technical ingenuity, its longer-term financial usability is limited, and the participation in it may be dangerous to any entity smaller than eg. a big bank, a big hedge fund or a big authoritarian state.
All fixes for this problem I could think of by now would be easily beaten by simple attacks. I am not sure if it is possible to have a reliable solution to it at all.
Do smart contracts and secondary tokens change this?
Unhappily, no. Smart contracts are based on having Ether, and need Ether to exist and act. Thus, they are bound to the financial situation of the Ether financial system, and are influenced by it. The bigger is the scope of the smart contract, the bigger is its dependence on the Ether situation.
Due to this, smart contracts of meaningful size will find themselves hampered and maybe even endangered by a polarization in the financial system powered by POS-based Ethereum. It is technically possible to migrate these contracts to a competing underlying system, but it won’t be easy – probably even when the competing system is technically a clone of Ethereum, like Ethereum Classic. The migration cost might exceed the migration benefits at any given stage of the contract project development, even if the total migration benefits are far larger than this cost.
Eventually this problem might become public knowledge and most projects in need of a smart contract might start avoiding Ethereum. This will lead to decreased interest in participation in the Ethereum ecosystem, to a loss of market cap, and eventually maybe even to the demise of this technically great project.
There is a danger that the “haves” minority in a polarized system might start actively investing resources in creating other systems that suffer from the same problem (as they benefit from it), or in modifying existing systems in this direction. This might decrease the potential for development globally. As some of the backers of Ethereum are entities with enormous clout worldwide, that negative influence on the global system might be significant.
In the world of data science, users must often sacrifice cluster set-up time to allow for complex usability scenarios. Amazon EMR allows data scientists to spin up complex cluster configurations easily, and to be up and running with complex queries in a matter of minutes.
Data scientists often use scheduling applications such as Oozie to run jobs overnight. However, Oozie can be difficult to configure when you are trying to use popular Python packages (such as “pandas,” “numpy,” and “statsmodels”), which are not included by default.
One such popular platform that contains these types of packages (and more) is Anaconda. This post focuses on setting up an Anaconda platform on EMR, with an intent to use its packages with Oozie. I describe how to run jobs using a popular open source scheduler like Oozie.
For this post, you walk through the following tasks:
Create an EMR cluster.
Download Anaconda on your master node.
Test the steps.
Create an EMR cluster
Spin up an Amazon EMR cluster using the console or the AWS CLI. Use the latest release, and include Apache Hadoop, Apache Spark, Apache Hive, and Oozie.
To create a three-node cluster in the us-east-1 region, issue an AWS CLI command such as the following. This command must be typed as one line, as shown below. It is shown here separated for readability purposes only.
At the time of publication, Anaconda 4.4 is the most current version available. For the download link location for the latest Python 2.7 version (Python 3.6 may encounter issues), see https://www.continuum.io/downloads. Open the context (right-click) menu for the Python 2.7 download link, choose Copy Link Location, and use this value in the previous wget command.
This post used the Anaconda 4.4 installation. If you have a later version, it is shown in the downloaded filename: “anaconda2-<version number>-Linux-x86_64.sh”.
Run this downloaded script and follow the on-screen installer prompts.
For an installation directory, select somewhere with enough space on your cluster, such as “/mnt/anaconda/”.
The process should take approximately 1–2 minutes to install. When prompted if you “wish the installer to prepend the Anaconda2 install location”, select the default option of [no].
After you are done, export the PATH to include this new Anaconda installation:
Zip up the Anaconda installation:
zip -r anaconda.zip .
The zip process may take 4–5 minutes to complete.
(Optional) Upload this anaconda.zip file to your S3 bucket for easier inclusion into future EMR clusters. This removes the need to repeat the previous steps for future EMR clusters.
Next, you configure Oozie to use Pyspark and the Anaconda platform.
Get the location of your Oozie sharelibupdate folder. Issue the following command and take note of the “sharelibDirNew” value:
oozie admin -sharelibupdate
For this post, this value is “hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136”.
Pass in the required Pyspark files into Oozies sharelibupdate location. The following files are required for Oozie to be able to run Pyspark commands:
These are located on the EMR master instance in the location “/usr/lib/spark/python/lib/”, and must be put into the Oozie sharelib spark directory. This location is the value of the sharelibDirNew parameter value (shown above) with “/spark/” appended, that is, “hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136/spark/”.
Put the “myPysparkProgram.py” into the location mentioned between the “<jar>xxxxx</jar>” tags in your workflow.xml. In this example, the location is “hdfs:///user/oozie/apps/”. Use the following command to move the “myPysparkProgram.py” file to the correct location:
This is a sentence. 12
So is this 12
This is also a sentence 12
The myPysparkProgram.py has successfully imported the numpy library from the Anaconda platform and has produced some output with it. If you tried to run this using standard Python, you’d encounter the following error:
Now when your Python job runs in Oozie, any imported packages that are implicitly imported by your Pyspark script are imported into your job within Oozie directly from the Anaconda platform. Simple!
If you have questions or suggestions, please leave a comment below.
John Ohle is an AWS BigData Cloud Support Engineer II for the BigData team in Dublin. He works to provide advice and solutions to our customers on their Big Data projects and workflows on AWS. In his spare time, he likes to play music, learn, develop tools and write documentation to further help others – both colleagues and customers alike.
The so-called (and marketing-branded) “blockchain technology” is promised to revolutionize every industry. Anything, they say, will become decentralized, free from middle men or government control. Services will thrive on various installments of the blockchain, and smart contracts will automatically enforce any logic that is related to the particular domain.
I don’t mind having another technological leap (after the internet), and given that I’m technically familiar with the blockchain, I may even be part of it. But I’m not convinced it will happen, and I’m not convinced it’s going to be the next internet.
If we strip the hype, the technology behind Bitcoin is indeed a technical masterpiece. It combines existing techniques (likes hash chains and merkle trees) with a very good proof-of-work based consensus algorithm. And it creates a digital currency, which ontop of being worth billions now, is simply cool.
But will this technology be mass-adopted, and will mass adoption allow it to retain the technological benefits it has?
First, I’d like to nitpick a little bit – if anyone is speaking about “decentralized software” when referring to “the blockchain”, be suspicious. Bitcon and other peer-to-peer overlay networks are in fact “distributed” (see the pictures here). “Decentralized” means having multiple providers, but doesn’t mean each user will be full-featured nodes on the network. This nitpicking is actually part of another argument, but we’ll get to that.
If blockchain-based applications want to reach mass adoption, they have to be user-friendly. I know I’m being captain obvious here (and fortunately some of the people in the area have realized that), but with the current state of the technology, it’s impossible for end users to even get it, let alone use it.
My first serious concern is usability. To begin with, you need to download the whole blockchain on your machine. When I got my first bitcoin several years ago (when it was still 10 euro), the blockchain was kind of small and I didn’t notice that problem. Nowadays both the Bitcoin and Ethereum blockchains take ages to download. I still haven’t managed to download the ethereum one – after several bugs and reinstalls of the client, I’m still at 15%. And we are just at the beginning. A user just will not wait for days to download something in order to be able to start using a piece of technology.
I recently proposed downloading snapshots of the blockchain via bittorrent to be included in the Ethereum protocol itself. I know that snapshots of the Bitcoin blockchain have been distributed that way, but it has been a manual process. If a client can quickly download the huge file up to a recent point, and then only donwload the latest ones in the the traditional way, starting up may be easier. Of course, the whole chain would have to be verified, but maybe that can be a background process that doesn’t stop you from using whatever is built ontop of the particular blockchain. (I’m not sure if that will be secure enough, and that, say potential Sybil attacks on the bittorrent part won’t make it undesirable, it’s just an idea).
But even if such an approach works and is adopted, that would still mean that for every service you’d have to download a separate blockchain. Of course, projects like Ethereum may seem like the “one stop shop” for cool blockchain-based applications, but fragmentation is already happening – there are alt-coins bundled with various services like file storage, DNS, etc. That will not be workable for end-users. And it’s certainly not an option for mobile, which is the dominant client now. If instead of downloading the entire chain, something like consistent hashing is used to distribute the content in small portions among clients, it might be workable. But how will trust work in that case, I don’t know. Maybe it’s possible, maybe not.
And yes, I know that you don’t necessarily have to install a wallet/client in order to make use of a given blockchain – you can just have a cloud-based wallet. Which is fairly convenient, but that gets me to my nitpicking from a few paragraphs above and to may second concern – this effectively turns a distributed system into a decentralized one – a limited number of cloud providers hold most of the data (just as a limited number of miners hold most of the processing power). And then, even though the underlying technology allows for a distributed deployment, we’ll end-up again with simply decentralized or even de-facto cenetralized, if mergers and acquisitions lead us there (and they probably will). And in order to be able to access our wallets/accounts from multiple devices, we’d use a convenient cloud service where we’d login with our username and password (because the private key is just too technical and hard for regular users). And that seems to defeat the whole idea.
Not only that, but there is an inevitable centralization of decisions (who decides on the size of the block, who has commit rights to the client repository) as well as a hidden centralization of power – how much GPU power does the Chinese mining “farms” control and can they influence the network significantly? And will the average user ever know that or care (as they don’t care that Google is centralized). I think that overall, distributed technologies will follow the power law, and the majority of data/processing power/decision power will be controller by a minority of actors. And so our distributed utopia will not happen in its purest form we dream of.
My third concern is incentive. Distributed technologies that have been successful so far have a pretty narrow set of incentives. The internet was promoted by large public institutions, including government agencies and big universitives. Bittorrent was successful mainly because it allowed free movies and songs with 2 clicks of the mouse. And Bitcoin was successful because it offered financial benefits. I’m oversimplifying of course, but “government effort”, “free & easy” and “source of more money” seem to have been the successful incentives. On the other side of the fence there are dozens of failed distributed technologies. I’ve tried many of them – alternative search engines, alternative file storage, alternative ride-sharings, alternative social networks, alternative “internets” even. None have gained traction. Because they are not easier to use than their free competitors and you can’t make money out of them (and no government bothers promoting them).
Will blockchain-based services have sufficient incentives to drive customers? Will centralized competitors just easily crush the distributed alternatives by being cheaper, more-user friendly, having sales departments that can target more than hardcore geeks who have no problem syncing their blockchain via the command line? The utopian slogans seem very cool to idealists and futurists, but don’t sell. “Free from centralized control, full control over your data” – we’d have to go through a long process of cultural change before these things make sense to more than a handful of people.
Speaking of services, often examples include “the sharing economy”, where one stranger offers a service to another stranger. Blockchain technology seems like a good fit here indeed – the services are by nature distributed, why should the technology be centralized? Here comes my fourth concern – identity. While for the cryptocurrencies it’s actually beneficial to be anonymous, for most of the real-world services (i.e. the industries that ought to be revolutionized) this is not an option. You can’t just go in the car of publicKey=5389BC989A342…. “But there are already distributed reputation systems”, you may say. Yes, and they are based on technical, not real-world identities. That doesn’t build trust. I don’t trust that publicKey=5389BC989A342… is the same person that got the high reputation. There may be five people behind that private key. The private key may have been stolen (e.g. in a cloud-provider breach).
The values of companies like Uber and AirBNB is that they serve as trust brokers. They verify and vouch for their drivers and hosts (and passengers and guests). They verify their identity through government-issued documents, skype calls, selfies, compare pictures to documents, get access to government databases, credit records, etc. Can a fully distributed service do that? No. You’d need a centralized provider to do it. And how would the blockchain make any difference then? Well, I may not be entirely correct here. I’ve actually been thinking quite a lot about decentralized identity. E.g. a way to predictably generate a private key based on, say biometrics+password+government-issued-documents, and use the corresponding public key as your identifier, which is then fed into reputation schemes and ultimately – real-world services. But we’re not there yet.
And that is part of my fifth concern – the technology itself. We are not there yet. There are bugs, there are thefts and leaks. There are hard-forks. There isn’t sufficient understanding of the technology (I confess I don’t fully grasp all the implementation details, and they are always the key). Often the technology is advertised as “just working”, but it isn’t. The other day I read an article (lost the link) that clarifies a common misconception about smart contracts – they cannot interact with the outside world – they can’t call APIs (e.g. stock market prices, bank APIs), they can’t push or fetch data from anywhere but the blockchain. That mandates the need, again, for a centralized service that pushes the relevant information before smart contracts can pick it up. I’m pretty sure that all cool-sounding applications are not possible without extensive research. And even if/when they are, writing distributed code is hard. Debugging a smart contract is hard. Yes, hard is cool, but that doesn’t drive economic value.
I have mostly been referring to public blockchains so far. Private blockchains may have their practical application, but there’s one catch – they are not exactly the cool distributed technology that the Bitcoin uses. They may be called “blockchains” because they…chain blocks, but they usually centralize trust. For example the Hyperledger project uses PKI, with all its benefits and risks. In these cases, a centralized authority issues the identity “tokens”, and then nodes communicate and form a shared ledger. That’s a bit easier problem to solve, and the nodes would usually be on actual servers in real datacenters, and not on your uncle’s Windows XP.
That said, hash chaining has been around for quite a long time. I did research on the matter because of a side-project of mine and it seems providing a tamper-proof/tamper-evident log/database on semi-trusted machines has been discussed in many computer science papers since the 90s. That alone is not “the magic blockchain” that will solve all of our problems, no matter what gossip protocols you sprinkle ontop. I’m not saying that’s bad, on the contrary – any variation and combinations of the building blocks of the blockchain (the hash chain, the consensus algorithm, the proof-of-work (or stake), possibly smart contracts), has potential for making useful products.
I know I sound like the a naysayer here, but I hope I’ve pointed out particular issues, rather than aimlessly ranting at the hype (though that’s tempting as well). I’m confident that blockchain-like technologies will have their practical applications, and we will see some successful, widely-adopted services and solutions based on that, just as pointed out in this detailed report. But I’m not convinced it will be revolutionizing.
I hope I’m proven wrong, though, because watching a revolutionizing technology closely and even being part of it would be quite cool.
Slashdot asks if password masking — replacing password characters with asterisks as you type them — is on the way out. I don’t know if that’s true, but I would be happy to see it go. Shoulder surfing, the threat is defends against, is largely nonexistent. And it is becoming harder to type in passwords on small screens and annoying interfaces. The IoT will only exacerbate this problem, and when passwords are harder to type in, users choose weaker ones.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.