requests是多少个很实用的Python HTTP客户端库,编写爬虫和测试服务器响应数据时平常会用到。能够说,Requests 完全满意如今互连网的供给

正文全体来源于官方文书档案
http://docs.python-requests.org/en/master/

安装方式相似采纳$
pip install requests。此外安装情势参考官方文书档案

 

HTTP – requests

 

import requests

 

GET请求

 

r  = requests.get(‘http://httpbin.org/get‘)

 

传参

>>> payload = {‘key1’: ‘value1’, ‘key2’: ‘value2’, ‘key3’: None}
>>> r = requests.get(http://httpbin.org/get'**, params=payload)**

 

http://httpbin.org/get?key2=value2&key1=value1

 

Note that any dictionary
key whose value is None will not be added to the URL’s query string.

 

参数也足以传递列表

 

>>> payload = {‘key1’: ‘value1’, ‘key2’: [‘value2’, ‘value3’]}

>>> r = requests.get(http://httpbin.org/get'**, params=payload)
>>>
print(r.url)**
http://httpbin.org/get?key1=value1&key2=value2&key2=value3

r.text 重临headers中的编码解析的结果,能够因而r.encoding = ‘gbk’来改变解码格局

r.content重临二进制结果

r.json()再次回到JSON格式,只怕抛出万分

r.status_code

r.raw再次回到原始socket respons,须要加参数stream=True

 

>>> r = requests.get(https://api.github.com/events'**, stream=True)**

>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object
at 0x101194810>

>>> r.raw.read(10)
‘\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03’

将结果保存到文件,利用r.iter_content()

 

with open(filename, ‘wb’) as fd:
    for chunk in r.iter_content(chunk_size):
        fd.write(chunk)

 

传递headers

 

>>> headers = {‘user-agent’: ‘my-app/0.0.1’}
>>> r = requests.get(url, headers=headers)

 

传递cookies

 

>>> url = ‘http://httpbin.org/cookies

>>> r = requests.get(url,json, cookies=dict(cookies_are=’working’))
>>> r.text
‘{“cookies”: {“cookies_are”: “working”}}’

 

 

POST请求

 

传递表单

r = requests.post(http://httpbin.org/post'**, data = {‘key’:‘value’})**

 

日常,你想要发送1些编码为表单情势的数据—格外像四个HTML表单。 要贯彻这些,只需简单地传递三个字典给 data 参数。你的数目字典
在发出请求时会自动编码为表单格局:

 

 

>>> payload = {‘key1’: ‘value1’, ‘key2’: ‘value2’}

>>> r = requests.post(http://httpbin.org/post"**, data=payload)
>>>
print(r.text)**
{
  …
  “form”: {
    “key2”: “value2”,
    “key1”: “value1”
  },
  …
}

有的是时候你想要发送的数量毫无编码为表单方式的。假设你传递3个 string 而不是一个dict ,那么数量会被一向公布出去。

 

>>> url = ‘https://api.github.com/some/endpoint
>>>
payload = {‘some’: ‘data’}

 

>>> r = requests.post(url, data=json.dumps(payload))

或者

>>> r = requests.post(url, json=payload)

 

 

传送文件

 

url = ‘http://httpbin.org/post
>>>
files = {‘file’: open(‘report.xls’, ‘rb’)}

>>> r = requests.post(url, files=files)

配置files,filename, content_type
and headers

files = {‘file’: (‘report.xls’, open(‘report.xls’, ‘rb’), ‘application/vnd.ms-excel’, {‘Expires’: ‘0’})}

 

files = {‘file’: (‘report.csv’, ‘some,data,to,send\nanother,row,to,send\n’)}

 

响应

 

r.status_code

r.heards

r.cookies

 

 

跳转

 

By default Requests will
perform location redirection for all verbs except HEAD.

 

>>> r = requests.get(http://httpbin.org/cookies/set?k2=v2&k1=v1'**)**

>>> r.url
http://httpbin.org/cookies

>>> r.status_code
200

>>> r.history
[<Response [302]>]

 

If you’re using HEAD, you
can enable redirection as well:

 

r=requests.head(‘http://httpbin.org/cookies/set?k2=v2&k1=v1',allow\_redirects=**True**)

 

You can tell Requests to
stop waiting for a response after a given number of seconds with
the timeoutparameter:

 

requests.get(http://github.com'**, timeout=0.001)**

 

 

高级本性

 

来自
<http://docs.python-requests.org/en/master/user/advanced/#advanced>

 

session,自动保存cookies,能够安装请求参数,下次呼吁自动带上请求参数

 

s = requests.Session()

s.get(http://httpbin.org/cookies/set/sessioncookie/123456789'**)**
r = s.get(http://httpbin.org/cookies'**)**

print(r.text)
# ‘{“cookies”: {“sessioncookie”: “123456789”}}’

session能够用来提供私下认可数据,函数参数级别的数据会和session级别的数目统一,若是key重复,函数参数级别的数码将覆盖session级其余数码。假设想收回session的有个别参数,能够在传递一个如出壹辙key,value为None的dict

 

s = requests.Session()
s.auth = (‘user’, ‘pass’) #权力认证
s.headers.update({‘x-test’: ‘true’})

# both ‘x-test’ and ‘x-test2’ are sent
s.get(http://httpbin.org/headers'**, headers={‘x-test2’: ‘true’})**

函数参数中的数据只会接纳2次,并不会保留到session中

 

如:cookies仅此次有效

r = s.get(http://httpbin.org/cookies'**, cookies={‘from-my’: ‘browser’})**

 

session也能够自动关闭

 

with requests.Session() as s:
    s.get(http://httpbin.org/cookies/set/sessioncookie/123456789'**)**

 

一呼百应结果不但包蕴响应的总体音信,也带有呼吁新闻

 

r = requests.get(http://en.wikipedia.org/wiki/Monty\_Python'**)**

r.headers

r.request.headers

 

 

SSL证书验证

 

 

Requests能够为HTTPS请求验证SSL证书,就好像web浏览器同样。要想检查有些主机的SSL证书,你能够动用 verify 参数:

 

 

>>> requests.get(https://kennethreitz.com'**, verify=True)*
requests.exceptions.SSLError: hostname ‘kennethreitz.com’
doesn’t match either of ‘\
.herokuapp.com’, ‘herokuapp.com’

在该域名上笔者未曾安装SSL,所以战败了。但Github设置了SSL:

>>> requests.get(https://github.com'**, verify=True)**
<Response [200]>

对此私有证书,你也能够传递2个CA_BUNDLE文件的途径给 verify 。你也得以设置REQUEST_CA_BUNDLE 环境变量。

 

>>> requests.get(https://github.com'**, verify=’/path/to/certfile’)**

 

只要您将 verify 设置为False,Requests也能忽视对SSL证书的求证。

 

>>> requests.get(https://kennethreitz.com'**, verify=False)**
<Response [200]>

暗中认可意况下, verify 是设置为True的。选项 verify 仅使用于主机证书。

您也得以钦点二个本地证书用作客户端证书,可以是单个文件(蕴含密钥和证书)或2个暗含多个文本路径的元组:

 

>>> requests.get(https://kennethreitz.com'**, cert=(‘/path/server.crt’, ‘/path/key’))**
<Response [200]>

响应体内容工作流

 

默许情况下,当你进行网络请求后,响应体会立刻被下载。你能够经过 stream 参数覆盖那些行为,推迟下载响应体直到访问 Response.content 属性:

 

tarball_url = ‘https://github.com/kennethreitz/requests/tarball/master
r = requests.get(tarball_url, stream=True)

那时候仅有响应头被下载下来了,连接保持开拓状态,由此同意大家依照条件获得内容:

 

if int(r.headers[‘content-length’]) < TOO_LONG:
  content = r.content
  …

若果设置stream为True,请求连接不会被关门,除非读取全部数据也许调用Response.close。

 

能够采取contextlib.closing来机关关闭连接:

 

 

import requests

from contextlib

import closing

tarball_url = https://github.com/kennethreitz/requests/tarball/master

file = r’D:\Documents\WorkSpace\Python\Test\Python34Test\test.tar.gz’

 

with closing(requests.get(tarball_url, stream=True)) as r:

with open(file, ‘wb’) as f:

for data in r.iter_content(1024):

f.write(data)

 

Keep-Alive

 

来自
<http://docs.python-requests.org/en/master/user/advanced/>

 

同壹会话内你发出的此外请求都会活动复用伏贴的连年!

留神:唯有拥有的响应体数据被读取实现连接才会被假释为连接池;所以保障将 stream设置为 False 或读取 Response 对象的 content 属性。

 

流式上传

Requests援助流式上传,那允许你发送大的数据流或文件而无需先把它们读入内存。要运用流式上传,仅需为您的请求体提供叁个类公事对象即可:

读取文件请使用字节的艺术,那样Requests会扭转不易的Content-Length

with open(‘massive-body’, ‘rb’) as f:
    requests.post(http://some.url/streamed'**, data=f)**

 

分块传输编码

 

对于出去和进入的呼吁,Requests也支撑分块传输编码。要发送三个块编码的伸手,仅需为您的请求体提供3个生成器

专注生成器输出应该为bytes

def gen():
    yield b’hi’
    yield b’there’

requests.post(http://some.url/chunked'**, data=gen())**

For chunked encoded
responses, it’s best to iterate over the data using Response.iter_content(). In an ideal situation you’ll
have set stream=True on the request, in which case you can iterate
chunk-by-chunk by calling iter_content with a chunk size parameter of None.
If you want to set a maximum size of the chunk, you can set a chunk size
parameter to any integer.

POST Multiple Multipart-Encoded Files

 

来自
<http://docs.python-requests.org/en/master/user/advanced/>

 

<input type=”file” name=”images” multiple=”true”
required=”true”/>

 

To do that, just set files to a list of tuples
of (form_field_name, file_info):

 

>>> url = ‘http://httpbin.org/post
>>>
multiple_files = [
       
(‘images’, (‘foo.png’, open(‘foo.png’, ‘rb’), ‘image/png’)),
       
(‘images’, (‘bar.png’, open(‘bar.png’, ‘rb’),
‘image/png’))]
>>>
r = requests.post(url, files=multiple_files)
>>>
r.text
{
 

  ‘files’:
{‘images’: ‘ ….’}
 
‘Content-Type’: ‘multipart/form-data;
boundary=3131623adb2043caaeb5538cc7aa0b3a’,
 

}

Custom Authentication

Requests allows you to use specify your own authentication
mechanism.

Any callable which is passed as the auth argument to a request method will have the opportunity to
modify the request before it is dispatched.

Authentication implementations are subclasses
of requests.auth.AuthBase, and are easy to define. Requests provides two common
authentication scheme implementations in requests.auth:HTTPBasicAuth and HTTPDigestAuth.

Let’s pretend that we have a web service that will only
respond if the X-Pizza header is set to a password value. Unlikely, but just go
with it.

from requests.auth import AuthBase

class PizzaAuth(AuthBase):
    “””Attaches HTTP Pizza Authentication to the given Request
object.”””

    def __init__(self, username):
        # setup any auth-related data here
        self.username = username

def __call__(self, r):
        # modify and return the request
        r.headers[‘X-Pizza’] = self.username
        return r

Then, we can make a request using our Pizza Auth:

>>> requests.get(http://pizzabin.org/admin'**, auth=PizzaAuth(‘kenneth’))**
<Response [200]>

 

来自
<http://docs.python-requests.org/en/master/user/advanced/>

 

流式请求

 

r = requests.get(http://httpbin.org/stream/20'**, stream=True)**

for line in r.iter_lines():

 

代理

 

If you need to use a proxy, you can configure individual
requests with the proxies argument to any request method:

import requests

proxies = {
  ‘http’: http://10.10.1.10:3128,
  ‘https’: http://10.10.1.10:1080,
}

requests.get(http://example.org'**, proxies=proxies)**

 

To use HTTP Basic Auth with your proxy, use
the http://user:password@host/ syntax:

proxies = {‘http’: http://user:pass@10.10.1.10:3128/}

 

超时

 

 

If you specify a single value for the timeout, like
this:

 

r = requests.get(https://github.com'**, timeout=5)**

 

The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the
values separately:

 

r = requests.get(https://github.com'**, timeout=(3.05, 27))**

 

If the remote server is very slow, you can tell Requests to
wait forever for a response, by passing None as a timeout value and then
retrieving a cup of coffee.

 

r = requests.get(https://github.com'**, timeout=None)**

 

来自
<http://docs.python-requests.org/en/master/user/advanced/>

 

已使用 Microsoft OneNote 2016 创建。

相关文章

网站地图xml地图