I would like to extract a text contained in the website https://www.sec.gov/ix?doc=/Archives/edgar/data/918160/000091816018000065/form10-k2017.htm . I am looking Opinion on the Financial Statements headline and i have to extract only 1 paragraph which contains this word 'accompanying consolidated'. If there is a match then it should return all the text which starting with 'We have audited the.....' . I wanted to output this into a text file. I have tried different options and not able to find the right code to get this text . Can somebody please help me on this problem ?
Following code I have used to extract the information . But I am getting empty string.
library(rvest)
sample_url="https://www.sec.gov/ix?doc=/Archives/edgar/data/918160/000091816018000065/form10-k2017.htm"
cont<- read_html(sample_url)
output= gsub('\r\n','',html_nodes(cont_sree,'p') %>% html_text())
text=output[grepl("accompanying consolidated",output)]