Daniel - October 19, 2019
Do you intend to scrape data from Twitter? In this post, we will show you how to do so.
Twitter is one of the top 3 social networking websites in this digital age. It is more of a microblogging app where users can upload a status update of not more than 140 characters.
In this regard, Twitter users (Tweeters) can post just about anything and share ideas and feelings with other tweeter users using the mobile app or the web-based app.
This tutorial focuses on using data scraper to mine data from twitter. Data mined includes user names, number of followers, hashtags, photos and profile pics, links, geo-locations, date of signing up, etc.
Twitter is a massive platform of information useful to marketers. With twitter scraping tools, marketers can:
Connect to great market influencers
Effectively Monitor their competitors
Perform sentiment analysis
Study customer behavior
Target market audience with the relevant tweets.
Monitor marketing brands
Also, data Scraping from twitter is essential to researchers for researching and understanding some of the occurrences happening online.
Researchers can use data scraping tools to:
Monitor the popularity of tweets and people on twitter.
Gather information about tweeters. Such data include; friends, followers, profile pics, sign-up dates, etc.
Know who gets mentioned using the ‘@’ usernames
Survey how trends develop and change with time
Examine other twitter networks and communities
Followup on the influence of your tweets on people
Using API (Application Programming Interface) to scrape data from twitter is legal and authorized by twitter for third-party use without running into any form of trouble with Twitter.
Twitter does not permit you to scrape too much data beyond what the API allows you to. For this reason, most twitter scrapers use other web scrapers or develop scrapers of their own. Doing so may, and may not get you into trouble depending on the purpose of collecting data from Twitter.
How to scrape data from Twitter
There are varieties of tools for scraping twitter that does not require you to have programming knowledge. Such tools make data gathering from twitter easy.
Some of the popular tools and how to use them are discussed below.
Octoparse is an excellent tool for scraping data from social media sites.
Follow the guide below to use Octoparse
Download and install the latest version of Octoparse on your system.
Your system must meet the below criteria
Windows OS 7, 8 or 10
Microsoft NET Framework 3.5 (.Net3.5 SP1)
Register an account with Octoparse
Getting your Twitter URL
Copy the URL of your twitter search result
Paste the copied URL in the ‘Extraction URL’ box and save
Get more Data
From the ‘Advanced Options,’ select ‘Scroll Down.’
Set the ‘scroll down’ to a suitable ‘Scroll times’ and ‘Interval.’
Click on ‘Scroll down for one screen’ as ‘Scroll way’ and click ‘OK.’
Loop extra data from a tweet.
To loop extra data from tweets, create a ‘Loop Item.’
Select the Data you want to extract from the webpage. The selected data area is highlighted.
Click on ‘Select all’>> ‘Extract text from the selected elements’ in the ‘Action Tips’ panel.
You can choose to rename the ‘Field Name’ column if you have to.
Use Regular expression for reformatting data
You can skip this step if you’re OK with the result.
You can use the regular expression to delete words like ‘Retweet,’ ‘Like’ ‘Reply,’ etc. to use the regular expression,
Click on the ‘Reply’ row and select the ‘Customize data field.’
Click on ‘Refine extracted data’ and select ‘Add step.’
Click the ‘Replace’ button and paste the ‘Reply **’ with all space values from the extraction data’ Reply 856′ in the ‘Replace’ box.
Click ‘OK’
Extracting Data
Click on ‘Start Extraction’ >> ‘Local Extraction’
Click ‘Export’ to export scraped Data
Scrapestorm is a web scraping tool developed based on AI technology. It supports Windows, Mac, well as the Linux OS.
To use the Scrapestorm to scrape data, follow the guide below
Download and install ScrapeStorm on your system
Register an account with ScrapeStorm and log in.
Create a task
To create a task, copy the URL of your twitter search result.
Create a ‘New smart mode task.’ You can also create a task by importing the task rules.
Open the ‘URL edit’ window.
Paste the URL in the opened window.
Set the scraping rules
Intelligent mode recognizes the fields in your search result URL and automatically creates the fields in your URL.
You can edit any of the fields, rename, add or delete fields, modify data in the fields, etc. by right-clicking on the field.
Setup your scraping task
You can set schedule, IP rotation& delay, auto export, speed boost, download images, etc.
Data scraping starts automatically after a short while
Export your Data
Click on the ‘Export’ button to export scraped data.
Choose the file format for viewing export data. File format options available include Excel, CSV, HTML, text, and database.
Professional plan subscribers can export data files directly to WordPress.
The WebScraper is a useful tool for scraping historical data from twitter. By using the right filters, you can scrape advanced search data from Twitter. Such data can be quite valuable for market analysis.
To use the web scraper to scrape data from Twitter, follow the guidelines below
Download and install the web scraper chrome extension from Google Chrome store
Right-click and select ‘Inspect.’
A developer console pops-up
Click on the ‘Web Scraper’ tab and click on ‘Create a new sitemap.’
Click on ‘Import sitemap’ to import parameters from the sitemap JSON box. The sitemap is a navigation guide that navigates you through the site and how data can be extracted.
Finding historical tweets with Twitter Advanced Search
The Twitter Advanced Search is a tool for finding historical tweets that you can filter using parameters like Words, People, and Dates.
Visit https://twitter.com/search-advanced?lang=en. Filter based on your needs.
Do a search
Copy the search result URL from the address bar
On the WebScraper toolbar, click on the Sitemap button and click on ‘Edit metadata’
Paste the search URL from ‘Twitter’s advanced search page.
To start scraping,
Visit the sitemap and click ‘Scrape’ from the drop-down menu
A new Chrome tab opens up. This enables Google Chrome to crawl and scrape data.
Once scraping is complete, the browser closes and sends a notification.
Downloading the scraped Data
Go to the sitemap drop-down
Click on ‘Export as CSV’
Select ‘Download Now’
A CSV file with all the scraped Data starts downloading.
The PhantomBuster Twitter API is a great data scraping tool for extracting the profiles of key followers. This list is essential in building audiences for twitter ads or as strategies to get more followers.
Follow the steps below to install and use the PhantomBuster Twitter API
Create an account with PhantomBuster Twitter API
Add the PhantomBuster account to your Twitter account.
Click on the configure menu icon in the ‘Console.’
Create a spreadsheet of the twitter URLs you want to extract from using google spreadsheets.
Paste the URLs by rows in the spreadsheet.
Copy the spreadsheet’s URL to Phantombuster.
Authentication
· Download the Phantombuster extension from the Google Chrome store or Firefox.
The Phantombuster extension makes it easy for Phantombuster to authenticate itself using your cookies session.
Click on ‘Launch’ to start your data scraping automation
You can schedule repetitive launches of the Phantombuster to circumvent rate limits, mine more data, and spread workflows over days, weeks, or months. You can change the settings using the settings buttons of your dashboard.
Select the frequency of repetitive launches.
Output file
File output from in Phantombuster is in CSV or JSON format with the following fields.
Profile URL of a specified twitter account follower
Name, bio, location, User ID, etc.
(Check for other Google Chrome extensions or any other tools)
Tweepy is a commonly used data scraping tool for gathering hashtags, usernames, tweets, etc. on twitter. It is an interface between Twitter and Python.
To use Tweepy, You will need:
A valid Twitter account
A python program installed on your system. You can use python 2.7 or python 3.0
Anaconda package installer
Visit the twitter application page and log in with your twitter account to generate a series of access codes that permit you to scrape data from twitter.
Give your application a name.
Enter a description in the ‘Description’ field. E.g., I want to scrape tweets via hashtags.
Input a placeholder name in the website field. E.g., http://placeholder.com
Tick the developer agreement checkbox
Click on ‘Create your Twitter application’ to create your application.
Select the ‘Keys and access tokens.’ Take note of four different codes.
Copy the codes to notepads. The first of the code is the ‘consumer key’ (API Key). The second code is ‘Consumer Secret’ (API Secret).
Scroll down and click on ‘Create my access token.’ Scroll down and copy the ‘Access Token’ code and ‘Access Token Secret.’ Keep the codes safe.
Download Tweepy, which is a python library
Launch and navigate the anaconda terminal ‘C:\users\Ritvik>’
Type pip install Tweepy, which is the downloaded library. It is an interface between Python and twitter that has a lot of built-in functions
Press the enter key.
The python library accesses the internet and collects everything you need to install Tweepy 3.6.0
On the Tweepy, you need to specify the following:
The hashtags you want to scrape data from
the consumer key
The consumer secret access token and access token secret got from the twitter application website.
The first thing the hashtags function does is create an authentication object called ‘auth’ which is created from the four different access codes. The ‘auth’ validates you as an authentic user.
The next thing the function does is to create an API object, which is a language that you will use to request data from twitter.
Type in a name for your spreadsheet in the ‘name’ field. You should name your spreadsheets using the hashtags.
Name the header rows with the fields you want to fill up on the spreadsheet. The header rows can be the timestamp of the tweet, the text of the tweet, the tweeter (person), the hashtags, and the number of followers.
Specify the filters you want to apply, and the number of tweets you want to analyze in the item section.
Fill in the codes extracted from the tweeter application website
Fill in the hashtag phrase
Open your working directory
Open the spreadsheet. Your spreadsheet will contain columns of The timestamp, the tweet text, the username, all hashtags, and the number of followers.
With the web scraper tool, you can generate huge volumes of data from twitter. The generated data can be used for research and market analysis and any other applicative usage. Nevertheless, you can set your parameters and filters to streamline your scraped data.