python – 使用Inspect元素进行刮擦_python

概述我试图通过抓取来从Instagram获取一些信息.我在twitter上试过这个代码并且工作正常但是在Instagram上没有显示结果这两个代码都可以在这里找到. Twitter代码： from bs4 import BeautifulSoupfrom urllib2 import urlopentheurl = "https://twitter.com/realmadrid"thepage 我试图通过抓取来从Instagram获取一些信息.我在twitter上试过这个代码并且工作正常但是在Instagram上没有显示结果这两个代码都可以在这里找到.

Twitter代码：

from bs4 import BeautifulSoupfrom urllib2 import urlopentheurl = "https://twitter.com/realmadrID"thepage = urlopen(theurl)soup = BeautifulSoup(thepage,"HTML.parser")print(soup.find('div',{"class":"ProfileheaderCard"}))

结果：完全给出.

Instagram代码：

from bs4 import BeautifulSoupfrom urllib2 import urlopentheurl = "https://www.instagram.com/barackobama/"thepage = urlopen(theurl)soup = BeautifulSoup(thepage,{"class":"_BUGdy"}))

结果：无

解决方法如果您查看源代码,您将看到内容是动态加载的,因此您的请求返回的内容中没有div._BUGdy,具体取决于您希望它可以从脚本Json中提取它：

import requestsimport reimport Jsonr = requests.get("https://www.instagram.com/barackobama/")soup = BeautifulSoup(r.content)Js = soup.find("script",text=re.compile("window._sharedData")).text_Json = Json.loads((Js[Js.find("{"):Js.rfind("}")+1]))from pprint import pprint as pppp(_Json)

这将为您提供在返回的源中的< script type =“text / JavaScript”> window._sharedData = …..中看到的所有内容.

如果您想要关注者,那么您将需要使用像selenium这样的东西,该网站几乎都是动态加载的内容,以获得您需要的关注者点击链接,只有您登录后才能看到,这将获得你更接近你想要的东西：

from selenium import webdriverimport timelogin = "https://www.instagram.com"dr = webdriver.Chrome()dr.get(login)dr.find_element_by_xpath("//a[@class='_k6cv7']").click()dr.find_element_by_xpath("//input[@name='username']").send_keys(youruname")dr.find_element_by_xpath("//input[@name='password']").send_keys("yourpass")dr.find_element_by_CSS_selector("button._aj7mu._taytv._ki5uo._o0442").click()time.sleep(5)dr.get("https://www.instagram.com/barackobama")dr.find_element_by_CSS_selector('a[href="/barackobama/followers/"]').click()time.sleep(3)for li in dr.find_element_by_CSS_selector("div._n3cp9._qjr85").find_elements_by_xpath("//ul/li"):    print(li.text)

单击链接后,从d出窗口中显示的li标签中提取一些文本,您可以从无序列表中提取所需内容：

总结

以上是内存溢出为你收集整理的python – 使用Inspect元素进行刮擦全部内容，希望文章能够帮你解决python – 使用Inspect元素进行刮擦所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错，欢迎将内存溢出网站推荐给程序员好友。

欢迎分享，转载请注明来源：内存溢出

原文地址:https://54852.com/langs/1197638.html