Trustpilot Reviews Scraping, Keyword Extraction and Analysis
Trustpilot contains reviews for multiple companies which allow us to look into issues going on with these companies. Such reviews are a great source of information if you can download this text and analyze them.
The process can be divided in 3 parts
- Review scraping
- Keyword Extraction
- Analysis and visualization (Word Clouds, Race Bar Charts, etc.)
Part 1: Scraping Reviews from Trustpilot Website
- Import required libraries
- Create function to get overall number of reviews of the company
- Create function to get the beautiful soup output of all pages of review of the company. This step takes care of errors due to Trustpilot blocking any requests from your system after a certain number of requests, by creating an infinite loop which keeps retrying after sleeping for short durations.
- Select company and extract reviews into a dataframe. Here I have used “www.fitbit.com” as an example
Install BS4 before running — https://www.crummy.com/software/BeautifulSoup/
And there you have it. All the reviews for fitbit have been saved in the dataframe “df”.
Since the reviews cannot be used directly due to high volume and information in the form of text, we must extract keywords from them now.
Part2: Keyword Extraction
There are multiple ways of extracting keywords from text. In my opinion Yake does this work really well and as the algo is independent from length of text, language and number of items in the corpus, it is a great tool for this purpose. It can be installed as per the instructions on the official github page —…