This post was originally published on https://www.mdsec.co.uk/blog/ when I was under employment of MDSec Consulting Limited in the United Kingdom. This is mirrored on my own blog for archiving reasons.
A key step in an adversary simulation is the reconnaissance phase which almost always requires obtaining e-mail addresses for employees within the organisation. LinkedIn is probably one of the most widely used sources for reliable profiling of employees.
Although a great source of information, not many tools are readily available to the public for scraping this information and obtaining a list of e-mail addresses. Existing tools were using the LinkedIn API or were non-functional due to the numerous user interface (UI) updates over the past year.
The ActiveBreach team found a reliable scraping method produced by Danny Chrastil (@DisK0nn3cT) which was available here. This tool was modified and improved upon for our requirements to streamline the process of collection.
The original scraper by Danny Chrastil was modified in the following ways to improve and suit ActiveBreach’s operational requirements for performance scraping of LinkedIn.
Fixed to work with latest UI
Changed query to focus on using LinkedIn’s company filter after automatic discovery of company ID.
Automated e-mail prefix detection for a given company domain name. This is used in scenarios where we are attacking a client and we do not know their e-mail format
For preparation purposes, a LinkedIn account needs to be created. All you have to do is take that new account and connect it with an active account that you use. That will then allow the account to see all your connections up to the 3rd degree. If the account cannot see many people in a target company it is suggested that you go ahead and connect to a few key members of their company that may have a lot of contacts — such as HR.
Additionally, you will require a Hunter.io API key. You can register for one at https://hunter.io
The idealistic scenario would be that the operator only has to insert a company name, and all the intelligence gathering and scraping will be performed automatically and a list of e-mails comes out on the other end. LinkedInt is not at this level yet.
Currently the operator must navigate a number of choices and options within the tool. The following video shows an example usage:
In the future, we hope to develop this to the level where only the Company name is required and all other aspects are performed automatically with no intervention required. Furthermore, support for horizontal scraping and the ability to mass predict company names to company domains then convert these to email prefixes is a desired feature. We would also like to add Natural Language Processing (NLP) to discover the types of roles and departments that could allow us to separate departments and groups of employees for brief visualisation of relationships.