http headers - Python requests downloads HTML if file is not found -


I am downloading a list of remote files. My code looks like the following:

  Try: r = requests.get (url, stream = true, verify = false) total_length = int (r.headers ['content-length ']) If in total_length: r.iter_content (chunk_size = 1024) open f for f (file_name,' wb ') as f:: if chuck: f.write (chuck) f.flush () (Requests.RequestException, StandardError):  

My problem is that the request downloads plain HTML for those files that do not exist (for example 404 pages, or other nature HTML Pages Similar to). Is there any way to stop this? Any headers to check as content-type ?

Solution:

I accept the code r.raise_for_status (), according to the reply function call and content An additional check has been added for the type of as:

  if r.headers ['content-type'] .split ('/') [0] == " Text ": #pass / raise here  

(Mime type list here)

<4xx and 5xx status code to raise an exception for reactions

r.raise_for_status () , or < r.status_co De clearly.

An exception raises an HTTPError , which is the subclass of RequestException that you already hold:

  Try: r = requests.get (url, stream = true, verify = false) r.raise_for_status () # raises if 2xx or 3xx response is not total_ length = int (r.headers ['content -limit']) if total_lambi : # Etc (Request Requests: Exx Ption, StandardError): Checking  

r.status_code What you can think of a reasonable response code. Note that 3xx redirects are automatically regulated, and you will not see any other 3xx responses as <3> code , in this case the conditional request will not be sent, so There is a little bit needed in this clear test but if you do this, then something looks like this:

  r = request.get (url, stream = true, verify = false) r.raise_for_status () # Raises if 2xx or not 3xx response total_length = int (r.headers ['content- Nbai ']) that 200 & lt; = R.status_code & lt; 300 and total_long: # etc  

Comments