EDIT: I noticed I mixed up the code from one and the output from another script. Here is the right code with the right output
<div class="ingredient-list single-column"><div class="ingredient-list__part"><ul aria-labelledby="ingredients-title"><li><span class="ingredient"><span class="ingredient__product">aardappel (vastkokend)</span><span class="ingredient__unit">1 kg</span></span></li><li><span class="ingredient"><span class="ingredient__product">sjalot</span><span class="ingredient__unit">1</span></span></li><li> ...
I'm trying to extract the information within the span with ingredient__product and ingredient__unit separately.
The code I have written goes as follows:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = "https://dagelijksekost.een.be/gerechten/makreel-met-aardappelen-in-de-schil-en-rode-biet"
#open connectie en pagina pakken
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#html parsen
page_soup = soup(page_html, "html.parser")
ingredients = page_soup.find("ul",{"aria-labelledby":"ingredients-title"})
ingredient = ingredients.findAll('li')
for i in range(len(ingredient)):
print(ingredient[i].text.strip())
This was my first attempt and returns me this output:
- 1 kg aardappel
- 1 sjalot
- ...
I want to separate the information in the span tag so I tried modifying my code as follows:
ingredients = page_soup.find_all("span", {"class": "ingredient"})
print(ingredients)
This only prints an empty list. It seems like I can't "access" the information between span tags
What am I doing wrong?
If I have solved this step the next step would be to cycle through multiple recipes on this site. Any tips regarding how to cycle through URLs where the part after gerechten/ is variable is welcome as well.