python-beautifulsoup를 이용한 웹 스크래핑 1

2022-02-18

설치목록

lxml
beautifulsoup4
requests

cmd에 한줄씩 작성하여 설치

pip install lxml
pip install beautifulsoup4
pip install requests

주의할점

작성할 파일명을 bs4.py로 하지 않는다. 오류가 발생한다.

스크래핑 기본 코드

import requests, re #문자열을 다루기 위해 re도 미리 import(선택사항)
from bs4 import BeautifulSoup

def Scraping():
    url = "url주소입력"
    headers = {"user agent 입력"} # https://www.whatismybrowser.com/detect/what-is-my-user-agent 에서 확인 가능
    res = requests.get(url, headers=headers) #res에 html문서 저장
    res.raise_for_status() # 응답코드200(정상)과 다른 값이 나오면 HTTPError 발생

    soup = BeautifulSoup(res.text, "lxml") # BeautifulSoup를 이용해 res.text를 xml형태로 읽음

    texts = soup.find_all("a", attrs={"class":"subject-link"}) 
	# soup에 있는 a태그의 클래스 이름 "subject-link" 안에 있는 텍스트 내용을 texts에 저장
    return texts
	# 텍스트 내용을 반환(리스트 형식)

Share on

Twitter Facebook LinkedIn

천궁

python-beautifulsoup를 이용한 웹 스크래핑 1

Share on

You may also enjoy

python-discord.py를 이용한 디스코드 봇 만들기 1.기본구조

python-beautifulsoup를 이용한 웹 스크래핑 2

백준 1013번 - Contact

NUMBER GIRL - 透明少女