python - Scrapy : xpath of a link in a table -


I would like to remove some book links from this table.

The table looks like this:

  & lt; Table id = "table_text" & gt; & Lt; Tbody & gt; & Lt; Tr & gt; & Lt; TD & gt; 15/02/2014 & lt; / TD & gt; & Lt; Td> & Lt; A href = "/ book_1.html" & gt; Book 1 & lt; / A & gt; & Lt; / Td> & Lt; Td> Author & lt; / Td> & Lt; TD & gt; & Lt; A href = "/ tag1" & gt; Tag 1 & lt; / A & gt; & Lt; A href = "/ tag2" & gt; Tag 2 & lt; / A & gt; & Lt; / TD & gt; & Lt; TD & gt; Style & lt; / TD & gt; & Lt; / TR & gt;  

and the removed link should be:

  / book_1.html  

the selector I selected

  def parse (self, response): hxs = selector (response) hxs = selector (response) link = hxs.xpath ('// table [@ id = "table_text"] // tr / / Td [2] // a // @ href ')  

but print link shows an empty output: []

I would like to know what is wrong with xpath what I used to do?

With the information you provided, your XPath is working fine, it can be simplified

  // table [@ id = "table_text"] // tr / td [2] / a / @ href  

but your version is correct node Returns.

When faced with unpredictable behavior with Scrappie, try checking the HTML you always received, which you had hoped for has been recovered from the HTML browser and separated with scraps, May be different , because Scara does not handle JavaScript (and some browsers try to clean HTML).

This is the reason why you get the content of that feedback is all you need . If it is not, you need to find an alternative solution :)


Comments