Hello, I'm using R to do some basic web scraps where I am relaxing xml files and Ask xpath. Although I am having difficulty parsing a full HTML page and trying to remove XML for my comfort zone. For example:
parsedhtml & lt; - htmlParse ("http://www.w3schools.com/XPath/xpath_examples.asp")
HTML parses I am using it because xmlParse only works on .xml files. I know that by using getNodeSet I can isolate specific nodes within the parsed HTML. Therefore, I am trying to remove the embedded XML document under the "Example XML Document" section:
getNodeSet (parsedhtml, "// div [@class = 'code notranslate']")
Where I get the data in the right node, but it is not in standard XML and I am unable to use XMLPers. My question is how do I use the result of getNodeSet to remove XML?
Thanks a lot
Comments
Post a Comment