python 爬取压缩过的数据

在爬取bilibili的历史记录是,发现出现了UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 1: invalid start byte 错误,后来发现是因为data = response.read().decode("utf-8")这一句的data是压缩后的数据,无法正常解析后来对data进行解码就可以了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import urllib.request
import urllib.parse
from io import BytesIO
import gzip
import user_agent_list

url = 'https://www.bilibili.com/account/history'

random_user_agent = user_agent_list.getheaders()
request = urllib.request.Request(url)
request.add_header("User-Agent", random_user_agent)
response = urllib.request.urlopen(request)
data = response.read()
buf = BytesIO(data)
zip = gzip.GzipFile(fileobj=buf)
data = zip.read().decode('utf-8')
with open("cookies.html", "w", encoding='utf-8') as f:
f.write(data)
---------Thanks for your attention---------