Quantcast
Channel: Active questions tagged html - Stack Overflow
Viewing all articles
Browse latest Browse all 74158

Extract content between two sets of tags

$
0
0

I'm trying to extract some content and put it in tabular format on Excel. One column would be countries, the second column would be the measures they're implementing against the coronavirus. Here is what the HTML looks like:

<strong>AUSTRALIA</strong> - published 11.02.2020<br />
1. Passengers who have transited through or have been in China (People's Rep.) on or after 1 February 2020, will not be allowed to transit or enter Australia.<br />
- This does not apply to nationals of Australia. They will be required to self-isolate for a period of 14 days from their arrival into Australia.<br />
- This does not apply to permanent residents of Australia and their immediate family members. They will be required to self-isolate for a period of 14 days from their arrival into Australia.<br />
- This does not apply to airline crew.<br />
2. Nationals of Australia who have transited through or have been in China (People's Rep.) on or after 1 February 2020 will be required to self-isolate for a period of 14 days from their arrival into Australia.<br />
3. Permanent residents of Australia and their immediate family members who have transited through or have been in China (People's Rep.) on or after the 1 February 2020 will be required to self-isolate for a period of 14 days from their arrival into Australia.<br />
<br />
<strong>AZERBAIJAN</strong> - published 06.02.2020

So there is no real structure to speak of. However I'd like to be able to extract the list of countries as one column (that's easy since they're between strong tags). But I would like the other column to be the corresponding text for each country. That's harder since there is nothing to isolate this. The only thing that I can think of is to ask VBA to loop between two sets of strong tags and extract this content as the second column. I'm not sure how to do this though. The code I've found so far allows me to extract the list of countries and not much else:

Sub Test()

Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLAs As MSHTML.IHTMLElementCollection
Dim HTMLA As MSHTML.IHTMLElement

IE.Visible = True
IE.navigate "https://www.iatatravelcentre.com/international-travel-document-news/1580226297.htm"

Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop

Set HTMLDoc = IE.Document

ProcessHTMLPage HTMLDoc

    Set HTMLAs = HTMLDoc.getElementsByTagName("strong")

    For Each HTMLA In HTMLAs

    Debug.Print HTMLA.innerText
    If HTMLA.getAttribute("href") = "http://x-rates.com/table/" And HTMLA.getAttribute("rel") = "ratestable" Then
        HTMLA.Click
'I don't understand why, but the previous line of code is essential to making this work. Otherwise I only get the first country
        Exit For
        End If

Next HTMLA

End Sub

Viewing all articles
Browse latest Browse all 74158

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>