I have some messy html that uses repeated blockquote tags to display lines of poetry. Example:
<blockquote><blockquote>roses are red</blockquote></blockquote><br>
<blockquote><blockquote><blockquote>violets are blue</blockquote></blockquote></blockquote><br>
<blockquote><blockquote>this is another line</blockquote></blockquote><br>
<blockquote><blockquote><blockquote>and this is too</blockquote></blockquote></blockquote><br>
For lines of free verse, you’ll see as many as 7-8 block quote tags wrapping a line of text. I want to replace them with p or span tags and give them a class such as “indent-7” or “indent-8.”
There is unpredictable white space between the blockquote tags. Some have spaces between them, some are separated by new lines. I’m thinking python’s beautifulsoup is the way to handle this task.
How can I replace the nested blockquote tags with a single blockquote tag with a class of “n” where n is the number of tags that were nested?
Thanks!