Tag Archives: machinelearning

Identifying Programmers by their Coding Style

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/08/identifying_pro.html

Fascinating research de-anonymizing code — from either source code or compiled code:

Rachel Greenstadt, an associate professor of computer science at Drexel University, and Aylin Caliskan, Greenstadt’s former PhD student and now an assistant professor at George Washington University, have found that code, like other forms of stylistic expression, are not anonymous. At the DefCon hacking conference Friday, the pair will present a number of studies they’ve conducted using machine learning techniques to de-anonymize the authors of code samples. Their work could be useful in a plagiarism dispute, for instance, but it also has privacy implications, especially for the thousands of developers who contribute open source code to the world.

Detecting Phishing Sites with Machine Learning

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/08/detecting_phish.html

Really interesting article:

A trained eye (or even a not-so-trained one) can discern when something phishy is going on with a domain or subdomain name. There are search tools, such as Censys.io, that allow humans to specifically search through the massive pile of certificate log entries for sites that spoof certain brands or functions common to identity-processing sites. But it’s not something humans can do in real time very well — which is where machine learning steps in.

StreamingPhish and the other tools apply a set of rules against the names within certificate log entries. In StreamingPhish’s case, these rules are the result of guided learning — a corpus of known good and bad domain names is processed and turned into a “classifier,” which (based on my anecdotal experience) can then fairly reliably identify potentially evil websites.