这次分享下载抖音用户所有短视频方法,python爬虫批量抓取,无水印下载,期望和大家多多交流互相学习!
以获取用户链接方法
1、首先在抖音上随机挑选出一个小姐姐,用户主页右上角点上开,以获取分享链接
得到相似分享链接:在捏音,记录美好生活! https://v.douyin.com/eSN7g1c/
批量下载无水印视频
1.以获取用户抖音昵称,并创建同名文件夹放置视频
2.校验用户所有时间段是否公布过视频,校验数量为0则跳过,不为0则展开下载
3.下载并保存,数量和用户主页视频数量一致
4.检验一下,的确无水印,bingo!
python程序思路分享
1.根据用户页面分享的链接抽取url2.根据url去展开命令,通过停止使用重定向去以获取headers['location'],再从中提取sec_id
3.拼接该用户所有视频列表请求url,然后在下载保存即可。下面给出一个命令参数示例:params = {
'sec_uid' : 'MS4wLjABAAAAbtSlJK_BfUcuqyy8ypNouqEH7outUXePTYEcAIpY9rk', #每个用户相同 'count' : '200', #每次请求返回视频list中视频条数,不建议太大 'min_cursor' : '1612108800000',#用户视频已经开始时间,拎毫秒的时间撕 'max_cursor' : '1619251716404',#用户视频结束时间,拎毫秒时间撕 'aid' : '1128',#未明参数,可有可无 '_signature' : 'PtCNCgAAXljWCq93QOKsFT7QjR' #签名值,轻易从请求参数里面复制一个就能够一直用 }
关键代码:
1.提取互动链接中的url
string = input('粘贴互动链接:') shroturl = re.findall('[a-z]+://[S]+', string, re.I|re.M)[0]
2.请求上述url,禁用重定向以获取location的value,在正则抽取出来sec_id
startpage = requests.get(url=shroturl, headers=headers, allow_redirects=False) location = startpage.headers['location'] sec_uid = re.findall('(?<=sec_uid=)[a-z,A-Z,0-9, _, -]+', location, re.M|re.I)[0]
3.堆叠命令用户信息url,以获取用户昵称,也可以以获取其他信息,这里只取昵称
getname = requests.get(url='https://www.iesdouyin.com/web/api/v2/user/info/?sec_uid={}'.format(sec_uid), headers=headers).text userinfo = json.loads(getname) name = userinfo['user_info']['nickname']
4.创建用户昵称同名文件夹,转换至该路径下
Path = name if os.path.exists(path=Path) == False: os.mkdir(path=Path) else: print('directory exist') os.chdir(path=Path)
5.视频时间戳生成,原本可以轻易采用一个大跨度的时间段,但是在测试中辨认出时间跨度太大的话,下载的视频数量会变少,多次测试后确定以1个月为间隔,年份从2018到2021,基本上也没有更早的视频了吧
year = ('2018','2019','2020','2021') month = ('01','02','03','04','05','06','07','08','09','10','11','12') timepool = [x+'-'+y+'-01 00:00:00' for x in year for y in month ] print(timepool) k = len(timepool) for i in range(k) : if i < k-1 : print('begintime='+timepool[i]) print('endtime='+timepool[i+1]) beginarray = time.strptime(timepool[i], "%Y-%m-%d %H:%M:%S") endarray = time.strptime(timepool[i+1], "%Y-%m-%d %H:%M:%S") t1 = int(time.mktime(beginarray) * 1000) t2 = int(time.mktime(endarray) * 1000) print(t1,t2)
6.至这里,params里面的参数都拿到了,直接堆叠视频列表url,把回到结果存入json中。
awemeurl = 'https://www.iesdouyin.com/web/api/v2/aweme/post/?' awemehtml = requests.get(url=awemeurl, params=params, headers=headers).text data = json.loads(awemehtml)
7.轻易从json中提取我们要的内容{“视频数量”,“视频title”,“并无水印视频url”},然后还等啥,下载吧
awemenum = len(data['aweme_list']) print(awemenum) for i in range(awemenum): videotitle = data['aweme_list'][i]['desc'].replace("?", "").replace(""","").replace(":","") videourl = data['aweme_list'][i]['video']['play_addr']['url_list'][0] start = time.time() print('{} ===>downloading'.format(videotitle)) with open(videotitle+'.mp4', 'wb') as v: try: v.write(requests.get(url=videourl, headers=headers).content) end = time.time() cost = end - start print('{} ===>downloaded ===>cost {}s'.format(videotitle, cost)) except Exception as e: print('download error')
Python完整代码互动
import requests import json import os import time import re """ 1.根据用户页面互动的字符串抽取短url 2.根据短url加之302获取location,提取sec_id 3.堆叠视频列表请求url params = { 'sec_uid' : 'MS4wLjABAAAAbtSlJK_BfUcuqyy8ypNouqEH7outUXePTYEcAIpY9rk', 'count' : '200', 'min_cursor' : '1612108800000', 'max_cursor' : '1619251716404', 'aid' : '1128', '_signature' : 'PtCNCgAAXljWCq93QOKsFT7QjR' } """ headers = { "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Mobile Safari/537.36" } # string = '在捏音,记录美好生活! https://v.douyin.com/ekkTsYw/' string = input('粘贴互动链接:') shroturl = re.findall('[a-z]+://[S]+', string, re.I|re.M)[0] print(shroturl) startpage = requests.get(url=shroturl, headers=headers, allow_redirects=False) location = startpage.headers['location'] sec_uid = re.findall('(?<=sec_uid=)[a-z,A-Z,0-9, _, -]+', location, re.M|re.I)[0] getname = requests.get(url='https://www.iesdouyin.com/web/api/v2/user/info/?sec_uid={}'.format(sec_uid), headers=headers).text userinfo = json.loads(getname) name = userinfo['user_info']['nickname'] print(userinfo['user_info']['nickname']) Path = name if os.path.exists(path=Path) == False: os.mkdir(path=Path) else: print('directory exist') os.chdir(path=Path) """new function""" year = ('2018','2019','2020','2021') month = ('01','02','03','04','05','06','07','08','09','10','11','12') timepool = [x+'-'+y+'-01 00:00:00' for x in year for y in month ] print(timepool) k = len(timepool) for i in range(k) : if i < k-1 : print('begintime='+timepool[i]) print('endtime='+timepool[i+1]) beginarray = time.strptime(timepool[i], "%Y-%m-%d %H:%M:%S") endarray = time.strptime(timepool[i+1], "%Y-%m-%d %H:%M:%S") t1 = int(time.mktime(beginarray) * 1000) t2 = int(time.mktime(endarray) * 1000) print(t1,t2) params = { 'sec_uid' : sec_uid, 'count' : 200, 'min_cursor' : t1, 'max_cursor' : t2, 'aid' : 1128, '_signature' : 'PtCNCgAAXljWCq93QOKsFT7QjR' } awemeurl = 'https://www.iesdouyin.com/web/api/v2/aweme/post/?' awemehtml = requests.get(url=awemeurl, params=params, headers=headers).text data = json.loads(awemehtml) # print(data) # print(type(data)) awemenum = len(data['aweme_list']) print(awemenum) for i in range(awemenum): videotitle = data['aweme_list'][i]['desc'].replace("?", "").replace(""","").replace(":","") videourl = data['aweme_list'][i]['video']['play_addr']['url_list'][0] start = time.time() print('{} ===>downloading'.format(videotitle)) with open(videotitle+'.mp4', 'wb') as v: try: v.write(requests.get(url=videourl, headers=headers).content) end = time.time() cost = end - start print('{} ===>downloaded ===>cost {}s'.format(videotitle, cost)) except Exception as e: print('download error')