If a Tool Doesn’t Exist, Build it Yourself!
When looking for social data, we at Kahena use a plethora of tools – like Social Crawlitics for share data and Followerwonk for twitter data. Followerwonk is an incredible tool for analyzing specific users and their follower/following/tweet data. But what if I wanted to scrape the users in a Twitter lists? Followerwonk doesn’t have this functionality and I found myself at a loss.
The founders of the SearchLove conference series, Distilled, recently created an awesome list on twitter of all attendees and speakers for SearchLove London 2013: https://twitter.com/distilled/searchlove-london-2013/members . This got me excited for the conference and allowed for a bit of pre-conference networking. But how do you prioritize?
Well – the perfect time to procrastinate on my to do list and packing! I built a quick, *very* rough Google Doc tool to scrape a twitter list 🙂
Here’s the result [read only access – FILE / MAKE A COPY to edit it]: https://docs.google.com/a/kahenadigital.com/spreadsheet/ccc?key=0AreQmeK8xd0NdGpRQkVja0xjLXl4dVdmdzhBNUZwZWc&usp=drive_web#gid=2
Screenshot:
How To:
With the help of the good ol’ ImportXML function in Google Docs, you can pretty much scrape anything on the web. However, Twitter uses endless scrolling and dynamically loaded content (AJAX, etc) that won’t be present on initial HTML page load. Here’s how I went about it:
- Scroll to the absolute bottom of the twitter list so all members are loaded. (limited to relatively small lists.. when you get to 1000’s of members, this would be super annoying).
- File / SAVE AS html version. Upload to a random web server so you can access it via your web-browser.
- Crawl with Screaming Frog SEO Spider or Xenu (to get all external links in a CSV). Filter only for twitter username URLs.
- Copy/Paste all of the twitter user URLs in Column A.
- I analyzed each element I wanted to scrape by right clicking in Chrome and doing an ‘inspect element’ – in order to plugin that XPATH call into the IMPORTXML() function. The A2 in the following formulas represents a twitter user URL (like http://twitter.com/AriNahmani/ )
-
Full Name: =importXML(A2,"//h1[@class='fullname editable-group']")
-
Followers: =importxml(A2,"//a[@data-nav='followers']/strong")
-
Followers: =importxml(A2,"//a[@data-element-term='following_stats']/strong")
-
Tweets: =importxml(A2,"//a[@data-element-term='tweet_stats']/strong")
-
Description: =importXML(A2,"//p[@class='bio profile-field']")
-
- Because Google Docs only allows you 50 ImportXML calls per sheet, the idea here is to do <50, then copy and paste over the cells with *values only*. Then drag down further.
Lessons Learned / TL;DR
In 10 minutes I had all the SearchLove London 2013 attendees’ twitter data and could prioritize who I followed, who I wanted to meet, and who I wanted to stay away from at the conference (the tweets/follower ratios clearly indicate some of you have bought some likes – you cheeky marketers, you! … look mom I speak British English). All jokes aside, this tool could be used for outreach and link building, etc. Hopefully, like most quick tools built in Google Docs, a proper toolset will adopt the functionality.
In the time you spend looking for a tool to do exactly what you want, with some basic knowledge of HTML and XPATH, you can build a quick agile tool to complete the task *and* save you, your colleagues – heck even in the whole inbound marketing community some time in the future. While there are a *ton* of tools out there, many of which are free, there’s nothing like slapping something quick together to get exactly what you want.