二. Urllib库详解

Urllib库的详解

什么是Urllib？

urllib.request 请求模块
urllib.error 异常处理模块
urllib.parse url解析模块
urllib.robotparser robots.txt解析模块

相比Python2变化

Python2:

import urllib2
response=urllib2.urlopen("http://www.baidu.com")

Python3:

import urllib.request
response=urllib.request.urlopen("http://www.baidu.com")

urlopen（）

urlopen()的函数原型：urllib.request.urlopen(url,data=None, [timeout,]*,…)

举例如下:


 import urllib.request
 response=urllib.request.urlopen("http://www.baidu.com")
 print(response.read().decode("utf8"))


import urllib.parse
import urllib.request
data=bytes(urllib.parse.urlencode({"word":"hello"}),encoding="utf8")
response=urllib.request.urlopen("http://httbin.org/get",data=data)                                   //http测试网站
print(response.read())

响应

import urllib.request
response=urllib.request.urlopen("http://www.google.com")
print (type(response))

状态码，响应头

import urllib.request
response=urllib.request.urlopen("http://www.google.com")
print(response.status)
print(response.getheaders())
print(response.getheader("Server"))

Request

from urllib import request,parse
url="http://httpbin.org/post"
headers={"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)",
"host":"httpbin.org"
}
dict={"name":"Germey"}
data=bytes(parse.urlencode(dict),encoding="utf8")
req=request.Request(url=url,data=data,headers=headers,method="post")
response=urllib.request.urlopen(req)
print(response.read()).decode("utf8")

Handler

代理

import urllib.request
proxy_handler=urllib.request.ProxyHandler({"http":"http://127.0.0.1:9743","https":"https://127.0.0.1:9743"})
opener=urllib.request.build_opener(proxy_handler)
response=opener.open("http://www.baidu.com")
print(response.read())

cookie是用来保持用户对话的。

import urllib.request
import http.cookiejar

cookie=http.cookiejar.CookieJar()
handler=urllib.request.HTTPCookieProcessor(cookie)
opener=urllib.request.build_opener(handler)
response=opener.open("http://www.baidu.com")
for item in cookie:
    print(item.name+"="+item.value)

当然也可以把cookie保存成txt文件：

import urllib.request
import http.cookiejar

cookie=http.cookiejar.MozillaCookieJar(filename="cookie.txt")
handler=urllib.request.HTTPCookieProcessor(cookie)
opener=urllib.request.build_opener(handler)
response=opener.open("http://www.baidu.com")
cookie.save(ignore_discard=True,ignore_expires=True)

cookie的读取：

import urllib.request
import http.cookiejar

cookie=http.cookiejar.MozillaCookieJar(filename="cookie.txt")
cookie.load("cookie.txt",ignore_discard=True,ignore_expires=True)
handler=urllib.request.HTTPCookieProcessor(cookie)
opener=urllib.request.build_opener(handler)
response=opener.open("http://www.baidu.com")
print(response.read().decode("utf8"))

异常处理

from urllib import request,error
try:
response=request.urlopen("http://www.bangiuegiuidududu.com")
except error.URLError as e:
    print(e.reason)

URL解析

urllib.parse.urlparse(urlstring, scheme=”, allow_fragments=True)

from urllib.parse import urlparse
result=urlparse("http://www.baidu.com/index.html;user?id=5#comment")
print(type(result),result)

输出结果为+ ParseResult(scheme=’http’, netloc=’www.baidu.com’, path=’/index.html’, params=’user’, query=’id=5’, fragment=’comment’)

urlencode()

把字典对象转化成url请求参数

from urllib.parse import urlencode

parameters={"name":"germeny","id":"887"}
base_url="http://www.baidu.com?"
url=base_url+urlencode(parameters)
print(url)

WPA-PSK无线网络破解原理及过程

原文链接地址：

PostgreSQL 如何查询表大小

查询 PG 表的大小通常需要使用函数/视图来实现，分为单独查询和批量查询的场景，下面简单列一下： 1. 单表大小查询如果要查询单个表的大小，可以使用常用的函数，参考语句如下：注意：这个查询

hadoop节点重启步骤及基本操作

http://blog.sina.com.cn/s/blog_3d9e90ad0102wqn2.html 对于datanode可以在master中配置，然后在maste启动的时候，一并去启动这些

（转）学习 HTML5 Canvas 这一篇文章就够了

学习 HTML5 Canvas 这一篇文章就够了一、canvas 简介 <canvas> 是 HTML5 新增的，一个可以使用脚本(通常为 JavaScript)

GD32F450寄存器和库函数

GD32F4xx用户手册 GD32F450xx数据手册 GD32F3x0固件库使用指南一、寄存器介绍 1. 存储器映射表 GD32是一个32位的单片机，它的地址范围为2的32次方，也就是4GB的地址空

hashCode()方法是什么

hashCode()方法是什么,hashCode相同代表什么? 想知道hashCode是什么,就必须先了解一下什么是哈希Hash.简单来说Hash就是把任意长度的数据变成固定长度的数据. 所以Java中就

linux perl占用大量资源_linux – 执行Perl脚本时解决内存不足错误

我正在尝试基于英语维基百科转储中找到的前100K单词构建一个n-gram语言模型.我已经使用用Java编写的修改过的XML解析器提取出纯文本,但需要将其转换为vocab文件. 为了做到这一点,我找到了一个据说可以完成工作的perl脚本,但缺乏如何执行的指令.毋庸置疑

org.thymeleaf.exceptions.TemplateInputException: Error resolving template [students/getList], templa

原文链接：org.thymeleaf.exceptions.Temp

window 初始化mysql_Windows环境下初始化mysql

Linux环境中，安装好mysql后，还不能直接启动服务，必须先对数据库进行初始化。初始化的工作主要包括：初始化日志、表空间等数据库必须的文件；创建并初始化系统数据库(mysql)。初始化完成后，启动mysqld守护进程，方可访问数

【虚幻引擎】UE4/UE5科大讯飞文字合成语音

一、链接地址链接：https://pan.baidu.com/s/15Qoc48x3DLpw4eW1qHXInQ 提取码：jqpx B站视频链接：ht