Black Hat & DEF CON 2013 – Privacy, Security, and AI

I attended Black Hat and DEF CON USA 2013 this year in Las Vegas.  These two computer  security conferences were both founded by hacker Jeff Moss aka The Dark Tangent.  The Dark Tangent sold Black Hat for $14 million in 2005, but retained control of DEF CON.  He chairs the Black Hat conference, sits on the DHS Security Advisory Council, and is the chief security officer for ICANN, so he is pretty hardcore.  Black Hat, owned by UBM, costs thousands to attend and is supposedly more corporate, while DEF CON costs only $180, has cooler badges, and is more, uh, cultural.  This was my second year attending BH/DC, but next year I might skip Black Hat and try out the even more underground security conference, BSides.

Of course, I am no hacker; I am just a sysadmin, but I like to see what hijinks the hackers are up to these days.  Whether we are talking about the builders or the breakers,  hackers are having more and more impact.  Software is eating the world after all.  Look at China siphoning off intellectual property from US companies.  Look at the way hacking has branched out into organized crime.  If Snowden is to be believed, NSA analysts can hack anyone at will.   It’s also sobering to consider the impact that hacking will have on implantable medical devices.1  If I have learned anything from my interest in computer security, it is that many, if not most, of the electronic systems we rely on today were not designed with security in mind.  (I’m looking at you IP spoofing.)  This is true of internet protocols, industrial control systems, and yes, medical devices as well.

I am a computer consultant, so I do attend these things in that capacity as well, but I won’t bore everyone with how depressing it is to see my poor Windows systems continuing to get pwned by pass-the-hash and other exploits.  STILL!  After all these years.  Ugh!

One of the first talks that I attended was by Matthew Cole, who talked about a case in which the Italians convicted a bunch of CIA agents for kidnapping a Muslim cleric (aka extraordinary rendition) in Italy.  I had never heard of this case or of Cole, but this event from a couple of years ago is relevant today because, ironically, the Italians also used cell phone metadata2 to piece together their case against the CIA.  Cell phone metadata is the stuff that the NSA is gathering on each and every one of us Americans right now.  Also, Hezbollah supposedly used metadata analysis to arrest some CIA operatives back in 2011.  So Cole is calling out the CIA for sloppy tradecraft (spying) and failure to learn from past mistakes.  But it’s interesting to see some specific examples of how this supposedly innocuous metadata can be used against Americans.  This whole Third Party Doctrine thing needs to get reigned in.

I noticed that several of the presentations at Black Hat and DEF CON this year focused on machine learning algorithms.  One interesting project called CrowdSource was even funded by DARPA.  Their goal was to apply machine learning to the problem of malware analysis.  As any coder knows, Stack Overflow is one of the most useful forums for finding answers, and the creators of CrowdSource reasoned that malware authors are no different.  So they downloaded Stack Overflow, yes, the entire site, and used it to create semantic mappings between function calls and their natural language descriptions.  They then applied some machine learning math to help them predict just what their decompiled malware was trying to do.  I love this approach.  As the authors point out, it will stay up-to-date as long as Stack Overflow stays relevant, and they can even link back to the relevant Stack Overflow pages to show how conclusions were reached.  Clever.

Another example of machine learning being applied in the computer security domain was presented by Brazilian security expert Alexandre Pinto.  One problem that many companies face in computer security is realizing when they have been hacked.  Gone are the days of flashy hacker vandals making their exploits known to the world.  Malicious actors these days strive for stealth, and it is remarkably difficult to separate their footprints from the riotous chaos that constitutes “normal” network behavior.  Alex Pinto started out by lamenting that these SIEM systems that corporations use to log activity on their networks are incredibly difficult to configure and are remarkably ineffective.3

So Pinto went on sabbatical and started brushing up on machine learning.  He figured that the only way to address this big data problem was to enlist the help of robots.  He whipped up a neat little proof of concept example using a support vector machine to cluster IP addresses in his firewall logs.  This is sort of a trivial example, since IP blacklists are widely available and frontal attacks on firewalls don’t pose as much of a threat as the users with their browsers.  Nonetheless, the technique Pinto demonstrated could be adapted to cluster all manner of logged events on a network.  If he threw in some  heuristics (rules of thumb) such as the  “kill chain” event grouping suggested by John “Four” Flynn at Black Hat last year, it would add some codified human intelligence into the machine learning process and contribute to stronger computer security.

This is interesting because we are starting to see rudimentary AI being publicly discussed in the realm of computer security.  I assume of course that the NSA has had plenty of computer science PhDs working on more advanced AI based computer attacks and defenses for some time.  Yep, attacks.  Where the story gets more interesting is with this presentation at DEF CON by Soen Vanned:

Evolving Exploits Through Genetic Algorithms

This talk will discuss the next logical step from dumb fuzzing to breeding exploits via machine learning and evolution. Using genetic algorithms, this talk will take simple SQL exploits and breed them into precision tactical weapons. Stop looking at SQL error messages and carefully crafting injections, let genetic algorithms take over, and create lethal exploits to PWN sites for you!

Genetic algorithms basically try to mimic evolution by interbreeding and mutating potential solutions to evolve the fittest specimens.  In Soen’s case, of course, the “solutions” were SQL injection attack strings used to compromise web applications.4

So we have machine learning on the defense side trying to identify and analyze attacks, and we have it on the offense side trying to evolve exploits to bypass signature based filters.  This is starting to look like high frequency trading.  Have we had duelling AI’s going at each other behind the scenes between nation states for years already?  Maybe Peter Rothman is right and the Singularity already happened.

With the recent Snowden revelations, there was much talk about privacy.  Hackers are way ahead of the curve in these matters.  NSA whisteblower William Binney revealed details about NSA spying programs targeting Americans last year at DEF CON 2012.  This year, DEF CON featured a presentation by some folks from Montana who are working on privacy legislation at the state level.  Eric Fulton, a computer security specialist, worked with Montana state representative Daniel Zolnikov to prepare privacy bill HB 400 that ultimately died in a Montana state legislature committee.  But they are not giving up.  This bill was killed a few months prior to the Snowden revelations, so the public was less aware of privacy concerns at that time.  Fulton and Zolnikov plan to revise and break up HB 400 into smaller privacy bills that can be introduced in the future.

Here are the main points of HB 400:

(a) data subjects must be given notice when their personal information is being collected;
(b) personal information may be used only for the purpose stated and not for any other purposes;
(c) personal information may not be collected or disclosed without the data subject’s consent;
(d) personal information that is collected must be kept secure from any potential abuses;
(e) data subjects must be informed as to who is collecting personal information;
(f) data subjects must be allowed to access their personal information and make corrections to any inaccurate data;
(g) data subjects must have a method available to them to hold data collectors accountable for following the principles contained in this section.

These all seem fairly reasonable to me.  Maybe it’s a good idea to have states start enforcing privacy rights.

This question of privacy and who owns your personal data has been on my mind for some time.  Some guy promoting this Open-Source Everything idea gave a rambling, disjointed talk about hacking capitalism which was disappointingly bad.  He reiterated Lanier’s idea that people should own the data they create.  The problem is that most interesting data is created by interacting with services.  So you don’t in fact own your data, because the service providers control it on their servers.  But I would go farther and say you shouldn’t assert full ownership of this data as intellectual property, because it wouldn’t exist without the service you interacted with.  If phone services didn’t exist, phone call metadata wouldn’t exist.  So that’s a problem I hadn’t really thought through before.  Aside from the fact that it’s incredibly difficult to assert ownership of data in the first place, we can’t really claim exclusive ownership of so called “personal” data even in theory.  So we should go create something without using a service and assert ownership of that.

On the other hand, it would be nice to have something akin to privacy continue to exist in this world. Noah Schiffman and Skydog gave a talk called the Dark Arts of OSINT (OSINT = Open Source Intelligence), in which they showed how math can be applied to harvest publicly available data about anyone.  It is really amazing how much can currently be learned about you with only a couple of pieces of information.  SkyDog highlighted some of his favorite tools such as: Maltego, Search Diggity, and even Recorded Future5.  Schiffman then went on to lay out the math that can be used to do deep correlation between disconnected sets of information.  He cited the simple example of US Census data:

87% of the US population can be uniquely identified by gender, ZIP code, and full date of birth.

So that’s a tough attack to protect against.  Privacy really is dead.  The only solution that seemed even remotely plausible for maintaining anonymity was to spread misinformation about yourself to increase the noise to signal ratio and make it harder for malicious actors to build a profile of you.  I think Vinge talks about a service to provide this in Rainbow’s End.  Also, some guy at DEF CON told me about a service that is starting up to provide false information to various service providers on your behalf, but I think I lost his card.  I will look into that more.  If anyone knows about a service that does this, please post in comments.

Information systems are becoming more and more important in the real world every day.  Bits are taking control of atoms.  The people that can actually access and control these systems wield incredible power.  Some hackers do sell their work to oppressive governments and criminals.  However, hackers are inherently defiant and unpredictable.  I actually take consolation from this.  If a global information police state does emerge, we can trust that there will always be some hacker out there to throw a wrench in the works.  If for no other reason than the lulz.


  1. RIP to Barnaby Jack who was slated to speak on this topic at BlackHat this year.  I did not know him personally, but he seemed to have been a pretty cool guy.
  2. Ironically, the Italians supposedly used this Analyst Notebook software for this which they received from the US to help with intelligence analysis after 9/11.  Be careful sharing your toys there fellas.
  3. A Mandiant report from 2012 suggested that only 6% of intrusions are detected by internal processes.
  4. His Forced Evolution project is up on github if you can read python:
  5. This is an interesting future prediction service that I should probably write more about.

2 thoughts on “Black Hat & DEF CON 2013 – Privacy, Security, and AI

  1. Hi, Scott. This was a very interesting post and I agree with a lot that you have said, specially the “high-frequency trading” scenario. The fact is that defense nowadays is criminally behind offense, and my belief is that a large part of it comes from being unable to handle the firehose of data we could be using, hence the research.

    I am continuing to explore the subjects I touched on in the presentation, and I am hoping to have some new breakthroughs soon. Feel free to subscribe to the blog on the MLSec Project site if you are interested in the development.

    And BTW, I was unfamiliar with John Flynn’s work and I will chase it down to review it. Sounds very interesting, specially coming from a very data rich place as the company he was at the time.


Leave a Reply

Your email address will not be published. Required fields are marked *