Quantcast
Channel: Active questions tagged html - Stack Overflow
Viewing all articles
Browse latest Browse all 67497

Scraping text from a within another tag

$
0
0

EDIT: I noticed I mixed up the code from one and the output from another script. Here is the right code with the right output

<div class="ingredient-list single-column"><div class="ingredient-list__part"><ul aria-labelledby="ingredients-title"><li><span class="ingredient"><span class="ingredient__product">aardappel (vastkokend)</span><span class="ingredient__unit">1 kg</span></span></li><li><span class="ingredient"><span class="ingredient__product">sjalot</span><span class="ingredient__unit">1</span></span></li><li> ...

I'm trying to extract the information within the span with ingredient__product and ingredient__unit separately.

The code I have written goes as follows:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://dagelijksekost.een.be/gerechten/makreel-met-aardappelen-in-de-schil-en-rode-biet"


#open connectie en pagina pakken
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsen
page_soup = soup(page_html, "html.parser")


ingredients = page_soup.find("ul",{"aria-labelledby":"ingredients-title"})

ingredient = ingredients.findAll('li')

for i in range(len(ingredient)):
    print(ingredient[i].text.strip())

This was my first attempt and returns me this output:

  • 1 kg aardappel
  • 1 sjalot
  • ...

I want to separate the information in the span tag so I tried modifying my code as follows:

ingredients = page_soup.find_all("span", {"class": "ingredient"})

print(ingredients)

This only prints an empty list. It seems like I can't "access" the information between span tags

What am I doing wrong?

If I have solved this step the next step would be to cycle through multiple recipes on this site. Any tips regarding how to cycle through URLs where the part after gerechten/ is variable is welcome as well.


Viewing all articles
Browse latest Browse all 67497

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>