ncurrent.futures模块,设置多线程的个数为20个(实际不一定能达到,视计算机而定)。实现的示例代码如下:
# -*- coding: utf-8 -*-
from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED
import requests
from bs4 import BeautifulSoup
import time
class SyncCrawlSjqqMultiProcessing(object):
def parser(self,url):
req = requests.get(url)
soup = BeautifulSoup(req.text,"lxml")
name_list = soup.find(class_='app-list clearfix')('li')
names=[]
for name in name_list:
app_name = name.find('a',class_="name ofh").text
names.append(app_name)
return names
if __name__ == '__main__':
url = "https://sj.qq.com/myapp/category.htm?orgame=1&categoryId=114"
executor = ThreadPoolExecutor(max_workers=20)
syncCrawlSjqqMultiProcessing = SyncCrawlSjqqMultiProcessing()
t1 = time.time()
future_tasks=[executor.submit(print(syncCrawlSjqqMultiProcessing.parser(url)))]
wait(future_tasks, return_when=ALL_COMPLETED)
t2 = time.time()
print('一般方法,总共耗时:%s' % (t2 - t1))
运行结果如下:
D:\python\Python3\python.exe D:/project/python/zj_scrapy/zj_scrapy/SyncCrawlSjqqMultiProcessing.py
['宜人贷借款', '大智慧', '中国建设银行', '同花顺手机炒股股票软件', '随手记理财记账', '平安金管家', '翼支付', '第一理财', '平安普惠', '51信用卡管家', '借贷宝', '卡牛信用管家', '省呗', '平安口袋银行', '拍拍贷借款', '简理财', '中国工商银行', 'PPmoney出借', '360借条', '京东金融', '招商银行', '云闪付', '腾讯自选股(腾讯官方炒股软件)', '鑫格理财', '中国银行手机银行', '风车理财', '招商银行掌上生活', '360贷款导航', '农行掌上银行', '现金巴士', '趣花分期', '挖财记账', '闪银', '极速现金侠', '小花钱包', '闪电借款', '光速贷款', '借花花贷款', '捷信金融', '分期乐']
一般方法,总共耗时:0.3950002193450928
Process finished with exit code 0
比如单线程运行,多线程在爬虫时明显会要快很多。