Birdwatcher started out as a collection of small scripts to generate a classic weighted word cloud of Tweets from a group of users. As I thought about what else I could do with data from Twitter I decided to rewrite the scripts into a full-fledged, module based, console framework with a ton more functionality.
If you have any experience working with other frameworks such as Metasploit or Recon-ng, you will feel right at home with Birdwatcher as it's heavily inspired by these frameworks and has many of the same concepts and commands.
Just like Metasploit and Recon-ng, Birdwatcher supports the concept of Workspaces. Workspaces enable you to segment and manage users and data stored in the underlying database. You can use workspaces to create logical separation between different users. For example, you may want to create a workspace for a company, a department or for a specific topic.
The command prompt will always show the currently active workspace inside the square brackets. Birdwatcher will always have a default workspace which might be all you need if you intend to use Birdwatcher on a single group of users. If you plan to use it on several different groups, it is recommended to create a workspace for each of them, to prevent cross contamination.
The core of the Birdwatcher framework is its commands and one of the most
important ones is the
Executing the help command.
help command simply lists all available commands with short descriptions
of what they do.
Again, just like Metasploit and Recon-ng, Birdwatcher ships with a bunch of modules that either enrich the raw Twitter data harvested by the commands or somehow present the data in interesting and useful ways. Here are some of the things the modules can currently do:
- Retrieve user's Klout score, Tweet topics and influence graph
- Generate weighted word clouds based on user's Tweets
- Listing the most shared URLs
- Generate graphical social graphs between users
- Crawl shared URLs to retrieve HTTP status codes, content types and page titles
- Generate KML files with geo-enabled Tweets to be viewed in Google Earth
- Generate Punchcard-style plots of when users are most engaged with Twitter
- Calculate the sentiment score of Tweets (positive, neutral or negative)
Birdwatcher's code is designed to make it pretty simple for anyone with a bit of Ruby knowledge to extend Birdwatcher with new modules. How to create one is out of scope for this blog post, but have a look at this Wiki article if you are interested in finding out more.
If you have been following the news around the Snowden documents, you might have heard of a program by the UK intelligence agency GCHQ called LOVELY HORSE. The program was made to simply monitor a smaller group of security related Twitter accounts to keep taps on what was being said and possibly more.
To demonstrate the capabilities and usage of Birdwatcher, I thought it would be fun to go through how we can create our own LOVELY HORSE program...
Creating a new workspace
Instead of using the default workspace, let's create a dedicated one for our lovely horses to keep things neat and tidy:
Creating a new workspace.
workspace add command created our new workspace and
automatically made it the currently active one, as can be seen in the square
brackets of the command prompt.
Adding users to the workspace
Now that we have our workspace we need to add some users to it so we have something to work with. The leaked PDF contains a list of 37 Twitter accounts that we will use for this example:
0xcharlie alexsotirov anon_central anon_operations anonops anonymousirc bradarkin CeRTFi danchodanchev daveaitel dinodaizovi diocyde egyp7 GoVCeRT_NL halvarflake hdmoore hernano JaNeTCSiRT kevinmitnick lennyzeltser lulzsec mdowd mikko msftsecresponse operationleaks owasp pusscat Shadowserver snowfl0w taosecurity taviso teamcymru thegrugq TheHackersNews tinman2k VuPeN WTFuzz
One way to add the users would be to execute
user add 0xcharlie alexsotirov ... WTFuzz
but that would be a lot of typing and I don't really like that. Instead we can
make use of our first module to easily import them into the workspace. We copy
the usernames and save them to a file and load the User Importer module:
The User Importer module.
use command loads a module by its path. The path is
determined simply by how the module files are placed in the directory stucture.
Modules live inside at least one directory which can be seen as a namespace of
the type of object they are working on. In this case the User Importer lives in
users/ namespace which makes pretty good sense. When a module
is loaded it is also indicated in the command prompt with another set of square
brackets with the module's path in red text.
After loading the module we type
show info to get a bit more
information on what the module does. All modules have additional information that
can be seen with the
show info command.
show command can also display any options a module might
Options for the User Importer module.
The module is very basic and only has one option called
which tells the module which file to read usernames from. The table tells us
that the option is required to set and that the current value is empty. Let's
configure the module and run it:
The module fetched basic user information from the Twitter API and saved them
to the underlying database. We can see the users in the current workspace at any
time with the
user list command:
Paging through users in the workspace.
Now that we have imported our lovely horses we can fetch their Tweets from the Twitter API and have them saved to the database for analysis:
Fetching Tweets from users.
status fetch command will fetch up to 1.000 Tweets from each
user and save them to the database. The command also extracts entities such as
URLs, Mentions and Hashtags to save them to separate database tables. The
command might take a bit of time to finish the first time because of all the
Tweets it needs to fetch and process, however on any subsequent runs, it only
fetches and processes any Tweets the users might have posted since the last run.
Now that we have fetched the Tweets we can page through them with the
status list command:
Listing Tweets from users.
status search command we can find Tweets containing a
specific word or phrase, for example lovelyhorse:
Searching for Tweets mentioning lovelyhorse.
After the Tweets have been fetched and processed we also have a pretty large collection of URLs that might point to interesting or valuable information. Right now we only know the URLs that were shared which can pretty hard to process. To get a better idea of which links might be interesting we can use the URL Crawler module:
SLoading the URL Crawler module.
As the module information says, it enriches the collected URLs with their HTTP status codes, content types and potentially page titles if the URL points to a HTML page with a title. The module also follows redirects so in case the URL is somehow obfuscated or shortened we can know the actual destination too.
The module also warns us that it might not be safe to blindly visit all the shared URLs as it could be pointing at places you don't want to request with your own IP. Let's check the module's options to see what we can do:
Viewing options for the URL Crawler module.
This module has a bit more options than the
None of them are required but the
PROXY_PORT are definitely a good idea to configure. The options
will instruct the module to request all URLs through a HTTP proxy to hide the
origin of the request for your own safety and OPSEC. I personally have
Tor installed and its SOCKS proxy
exposed as an HTTP proxy with Polipo. Check out this
if you want to know how it's done.
Crawling URLs for more information.
We configure the module to use a proxy and run it. It will steadily crunch
through the URLs but it might take a while to finish, depending on your
connection speed, proxy,
THREADS setting and the amount of URLs to
crawl. The first time you run this module it can take quite a long time as it
needs to process a lot of URLs.
Getting Klout information
The Klout API can give us a lot of valuable information on users such as their Klout score which can be used to find users with the most reach and influence, the general topics they are Tweeting about, and an influence graph which can tell us who each user is influencing and who they are being influenced by.
The first module we need to run is the
This module simply retrieves each user's Klout ID which is needed for all the
other Klout related modules:
Retrieving user's Klout ID.
Next we run the
users/klout_topics module which retrieves the
general topics that each user is Tweeting about such as Technology,
Hacking, Marketing, Information Security, etc. Each
topic will be saved in a table and referenced through a join table to users, to
make it easy to retrieve users who Tweet, or don't Tweet, about a specific
Retrieving user's Klout topics.
If we want to know about how influential each user is, we can use the
users/klout_score module to retrieve their Klout score. The score
is calculated by Klout and is explained here,
but the higher the score the more influential the user is:
Retrieving user's Klout scores.
From the output we can see that mikko is the most influential followed by hdmoore and thegrugq. The klout score will of course also be saved to the database to make querying based on Klout scores possible.
Lastly we will run the
users/klout_influence module to retrieve
information about who our users are being influenced by and who they are
Retrieving user's Klout influence.
Making a word cloud
A great way to get a quick sense of what the users are talking about is to
statuses/word_cloud module. The module can generate a
classic weighted word cloud based on Tweets from all users, or a smaller
selection, within a window of time. The module has quite a lot of options for
Viewing options for the Word Cloud module.
We configure the module with a file destination for the generated image and
INCLUDE_PAGE_TITLES to true in order to mix in the page titles
we previously retrived with the
urls/crawl module. This gives an
even better idea of the topics our users have been talking about over the last
Generating a word cloud from Tweets.
The result is a pretty word cloud that tells us what has been on our lovely horse's mind over the last seven days:
The result of the word cloud module.
Generating an influence graph
The raw influence data we retrieved earlier with
users/klout_influence can be visualized and examined with the
Generating a visual influence graph.
The result is a directional graph showing who is influencing who according to Klout:
The influence graph visualized.
Generating a social graph
Another type of graph we can generate is a social graph that doesn't use Klout's influence data but instead finds social connections by analyzing each user's Tweets for mentions of other users:
The social graph between users.
The resulting graph is a bit different from the influence graph and shows a very tightly coupled cluster between some users. The edge weight between users is calculated simply by counting how many times they mention each other in Tweets. The thicker the line, the stronger the connection between two users:
The social graph visualized.
Plotting a user's Twitter engagement
Another question you might ask is at what day and time a user is most
engaged with Twitter. This might be useful for finding the time where a user
is most likely to engage with you on Twitter. We can use the
users/activity_plot module to get an idea of this:
Generating an activity plot for halvarflake.
The resulting plot tells us that halvarflake is generally very engaged with Twitter on Fridays at around 8AM and Tuesdays & Wednesdays at around 7PM:
halvarflake's activity plot.
Listing shared URLs
The last module I want to demonstrate in this blog post is the
urls/most_shared module. The module will simply list URLs shared
within a specific window of time ordered from most to least shared. If a URL
has been shared by several users it is a pretty good indicator that it has
Paging through the shared URLs.
Because we ran the
urls/crawl module earlier we also see page
title, content type and HTTP code which is very convenient. Because I used Tor
as an HTTP proxy we also ran into a CloudFlare CAPTCHA wall.
This concludes my first post on Birdwatcher. I hope you enjoyed it and hope you will include it in your OSINT toolbox. Feel free to file any bugs on GitHub or give me ideas for new modules.
In the next blog post I will go over some of the more advanced functionality of Birdwatcher, like querying the underlying database for data, interacting with Birdwatcher's code through the interactive Ruby shell and how to write a new module.