爬虫爬取压缩过的数据


在爬取bilibili的历史记录是,发现出现了UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 1: invalid start byte 错误,后来发现是因为data = response.read().decode("utf-8")这一句的data是压缩后的数据,无法正常解析后来对data进行解码就可以了

import urllib.request
import urllib.parse
from io import BytesIO
import gzip
import user_agent_list

url = 'https://www.bilibili.com/account/history'

random_user_agent = user_agent_list.getheaders()
request = urllib.request.Request(url)
request.add_header("User-Agent", random_user_agent)
response = urllib.request.urlopen(request)
data = response.read()
buf = BytesIO(data)
zip = gzip.GzipFile(fileobj=buf)
data = zip.read().decode('utf-8')
with open("cookies.html", "w", encoding='utf-8') as f:
    f.write(data)

文章作者: Mug-9
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Mug-9 !
评论
  目录