I would like to remove some book links from this table.
The table looks like this:
& lt; Table id = "table_text" & gt; & Lt; Tbody & gt; & Lt; Tr & gt; & Lt; TD & gt; 15/02/2014 & lt; / TD & gt; & Lt; Td> & Lt; A href = "/ book_1.html" & gt; Book 1 & lt; / A & gt; & Lt; / Td> & Lt; Td> Author & lt; / Td> & Lt; TD & gt; & Lt; A href = "/ tag1" & gt; Tag 1 & lt; / A & gt; & Lt; A href = "/ tag2" & gt; Tag 2 & lt; / A & gt; & Lt; / TD & gt; & Lt; TD & gt; Style & lt; / TD & gt; & Lt; / TR & gt;
and the removed link should be:
/ book_1.html
the selector I selected
def parse (self, response): hxs = selector (response) hxs = selector (response) link = hxs.xpath ('// table [@ id = "table_text"] // tr / / Td [2] // a // @ href ')
but print link
shows an empty output: []
I would like to know what is wrong with xpath
what I used to do?
With the information you provided, your XPath is working fine, it can be simplified
// table [@ id = "table_text"] // tr / td [2] / a / @ href
but your version is correct node Returns.
When faced with unpredictable behavior with Scrappie, try checking the HTML you always received, which you had hoped for has been recovered from the HTML browser and separated with scraps, May be different , because Scara does not handle JavaScript (and some browsers try to clean HTML).
This is the reason why you get the content of that feedback is all you need . If it is not, you need to find an alternative solution :)
Comments
Post a Comment