Quantcast
Channel: Active questions tagged html - Stack Overflow
Viewing all articles
Browse latest Browse all 73992

Parsing "Further reading" with selenium, python

$
0
0

I need to parse text from Further reading in wikipedia. My code can open "google" by inputing request, for example 'Bill Gates', and then it can find url of wikipedia's page.And now i need to parse text from Further reading, but i do not know how. Here is code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

URL = "https://www.google.com/"
adress = input()  #input request, example: Bill Gates

def main():
    driver = webdriver.Chrome()
    driver.get(URL)
    element = driver.find_element_by_name("q")
    element.send_keys(adress, Keys.ARROW_DOWN)
    element.send_keys(Keys.ENTER)
    elems = driver.find_elements_by_css_selector(".r [href]")
    link = [elem.get_attribute('href') for elem in elems]
    url = link[0]    #wikipedia's page's link


if __name__ == "__main__":
    main()

And here's HTML code

<h2>
<span class="mw-headline" id="Further_reading">Further reading</span>
</h2>
<ul>
<li>...</li>
<li>...</li>
<li>...</li>
<li>...</li>
...
</ul>
<h3>
<span class="mw-headline" id="Primary_sources">Primary sources</span>
<ul>
<li>...</li>
<li>...</li>
<li>...</li>
...
</ul>

url - https://en.wikipedia.org/wiki/Bill_Gates


Viewing all articles
Browse latest Browse all 73992

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>