网页数据的抓取-爬虫
实验环境
- OS: windows 11
- Python 3.6
- Chrome (我的版本是 98.0.4758.102)
- 使用的python工具是 selenium.
- 从https://chromedriver.chromium.org/home下载对应版本的chromedriver。我的chrome版本是98,所以chromedriver也是98。把解压出来的chromedriver.exe 放到工程目录下。
参考文章: https://selenium-python-zh.readthedocs.io/en/latest/getting-started.html
测试样例
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(executable_path="./chromedriver.exe", chrome_options=options)
driver.get("https://www.baidu.com/")
print(driver.title)
driver.close()
流程:
首先打开一个chrome浏览器。指定chromedriver的地址 "./chromedriver.exe"。打开百度网页,输出题目。
通过百度获取天气
xpath 可以通过chrome里右键单击所需元素“检查”,后右键单击元素选择“复制”==》“复制xpath”
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
# options.add_argument('headless') #uncomment this line to disable GUI
driver = webdriver.Chrome(executable_path="./chromedriver.exe", chrome_options=options)
driver.get("https://www.baidu.com/")
print(driver.title)
elem= driver.find_element_by_id('kw') # using id
elem.clear()
elem.send_keys("今天天气")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
# 等待网页渲染完成,直到下一页可以点击
wait = WebDriverWait(driver, 10).until(expected_conditions.title_contains('天气'))
# 查找温度
temp = driver.find_element_by_xpath('//*[@id="1"]/div[1]/div[1]/a[1]/div[1]/div[2]/span[1]')
print("当前温度", temp.text)
driver.close()
seaborn: statistical data visualizationseaborn