I am trying to get data for the following website using requests and Scrapy Selector.
import requests
from scrapy import Selector
url="https://seekingalpha.com/article/4312816-exxon-mobil-dividend-problems"
headers = {'user-agent': 'AppleWebKit/537.36'}
req = requests.get(url, headers=headers)
sel = Selector(text=req.text)
I could extract the text body but when tried to get the XPath for comments,
I noticed that the HTML returned from requests
is different from the inspector, therefore selecting the class='b-b'
like,
sel.xpath("//div[@class='b-b']")
returns an empty list in Python. It seems that I'm missing something or the HTML is partially hidden from the bots.
After view(response)
I found out the following is rendered,
My Questions
- Why the same HTML cannot be seen in the HTTP response?
- How to get the comments data using XPath expressions for this page