Quantcast
Channel: Active questions tagged html - Stack Overflow
Viewing all articles
Browse latest Browse all 74793

Extract text between two P's

$
0
0

I am trying to extract the data between two elements "Executives" and "Analysts", example, but i don't know how to proceed. My html is:

<div class="content_part hid" id="article_participants">
<p>Wabash National Corporation (NYSE:<a title="" href="http://seekingalpha.com/symbol/wnc">WNC</a>)</p><p>Q4 2014 <span class="transcript-search-span" style="background-color: yellow;">Earnings</span> Conference <span class="transcript-search-span" style="background-color: rgb(243, 134, 134);">Call</span></p><p>February 04, 2015 10:00 AM ET</p>
<p><strong>Executives</strong></p>
<p>Mike Pettit - Vice President of Finance and Investor Relations</p>
<p>Richard Giromini - President and Chief Executive Officer</p>
<p>Jeffery Taylor - Senior Vice President and Chief Financial Officer</p>
<p><strong>Analysts</strong></p>

I want to do this for a whole bunch of files my code till this far is:

from bs4 import BeautifulSoup
import requests
import textwrap
import os
from lxml import html
import csv

directory ='C:/Research syntheses - Meta analysis/SeekingAlpha'
for filename in os.listdir(directory):
    if filename.endswith('.html'):
        fname = os.path.join(directory,filename)
        with open(fname, 'r') as f:
            page=f.read()
            soup = BeautifulSoup(f.read(),'html.parser')
            match = soup.find('div',class_='content_part hid', id='article_participants')
    print(match)

I am a newby in Python, so bear with me.


Viewing all articles
Browse latest Browse all 74793

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>