Quantcast
Channel: Active questions tagged html - Stack Overflow
Viewing all articles
Browse latest Browse all 67441

Web scraping span using rvest

$
0
0

I would like to extract a text contained in the website https://www.sec.gov/ix?doc=/Archives/edgar/data/918160/000091816018000065/form10-k2017.htm . I am looking Opinion on the Financial Statements headline and i have to extract only 1 paragraph which contains this word 'accompanying consolidated'. If there is a match then it should return all the text which starting with 'We have audited the.....' . I wanted to output this into a text file. I have tried different options and not able to find the right code to get this text . Can somebody please help me on this problem ?

Following code I have used to extract the information . But I am getting empty string.

library(rvest)

sample_url="https://www.sec.gov/ix?doc=/Archives/edgar/data/918160/000091816018000065/form10-k2017.htm"

cont<- read_html(sample_url)

output= gsub('\r\n','',html_nodes(cont_sree,'p') %>% html_text())

text=output[grepl("accompanying consolidated",output)]

Viewing all articles
Browse latest Browse all 67441

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>