r - scrape xml from a html page with getNodeSet -

Hello, I'm using R to do some basic web scraps where I am relaxing xml files and Ask xpath. Although I am having difficulty parsing a full HTML page and trying to remove XML for my comfort zone. For example:

  parsedhtml & lt; - htmlParse ("http://www.w3schools.com/XPath/xpath_examples.asp")

HTML parses I am using it because xmlParse only works on .xml files. I know that by using getNodeSet I can isolate specific nodes within the parsed HTML. Therefore, I am trying to remove the embedded XML document under the "Example XML Document" section:

  getNodeSet (parsedhtml, "// div [@class = 'code notranslate']")

Where I get the data in the right node, but it is not in standard XML and I am unable to use XMLPers. My question is how do I use the result of getNodeSet to remove XML?

Thanks a lot

Six

Search This Blog

r - scrape xml from a html page with getNodeSet -

Comments

Post a Comment