Python html parsing of div data using bs4

I had a pdf from which I had to extract the text, I used Tika to parse the same. Since Tika could not do the parsing page wise, I used beautiful soup to achieve the same (Below is the code snippet). Now I want to remove the header and footer of the html page which Tika outputs. I have figured out that header and footer appears as last two lines of each div. Can anyone tell me how to extract all the data from a div except the last two lines as below:

<div class="page"><p />
<p></p>
<p>First line required
</p>
<p>Second line required
</p>
<p>Third line required
</p>
<p>Line 1 not required
</p>
<p>Line 2 not required
</p>
<p></p>
</div>
<div class="page"><p />
<p>line required 1
</p>
<p></p>
<p>line required 2
</p>
<p>line required 3
</p>
<p></p>
<p>line required 4
</p>
<p>line required 5
</p>
<p>line required 6
</p>
<p>Line 1 not required
</p>
<p>Line 2 not required
<p />
</div>

Existing code as below:

from tika import parser
raw = parser.from_file('pdfpath', xmlContent=True)
file_content = raw["content"]
soup = BeautifulSoup(file_content, 'html.parser')
for num, page in enumerate(soup.select('.page'), 1):
    content = page.get_text(strip=True, separator=' ').replace("\n", " ")

Python html parsing of div data using bs4

Trending Articles

Kalank - Malayalam (1CD ) - subtitles

Download: Stuf G ft B1 & Trice – Puzya Mami (Prod-j Stunner)

Notorious Naushad of Ippa gang nabbed

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Font Brazil World Cup 2004 kits

REQ: ReFX Nexus 5 Latin House Expansion

Brunei reaffirms healthcare commitment

Telangana TS New Food Security Card/ Telangana Ration card Application Form...

Moondru Mudichu 31-08-2017 – Polimer tv Serial

Chai Status, Funny Tea Quotes in Hindi, चाय पर शायरी

[GET] Dickie Bush and Nicholas Cole – Ghostwriter GPT ($350.00)

Benson Boone – Sorry I’m Here For Someone Else – Single [iTunes Plus M4A]

Practice Sheet of Right form of verbs for HSC Students

99 God Status for Whatsapp, Facebook

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Tinny — Dzormo (Prod by Hammer)

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Inthalo ennenni vinthalo ( male ) lyrics and translation | Karthikeya (2014)

Aaron Powers

Mp3 Download: Mr Raw - Adamma ft. Flavour & Harry B