Is scrapy free and open source?

Yes. scrapy is open source, available under the BSD-3-Clause license.

Is scrapy actively maintained?

scrapy is actively maintained — the latest commit landed within the last month. Its health score is 100/100.

What are some alternatives to scrapy?

Alternatives to scrapy include BeautifulSoup, Playwright, Selenium, Colly and Apify Crawlee.

scrapy

Python BSD-3-Clause★ 62,316

100

Health 100/100

Baseline50
Commit recency+25
Popularity+20
Release cadence+10
Open-issue load+5

Scrapy is a mature Python framework for building web crawlers and scrapers, offering an asynchronous engine, built-in support for following links, extracting structured data via selectors, and exporting results through configurable pipelines.

asyncio data-extraction python spiders twisted web-crawler web-scraping xpath-css-selectors

Raw .md

Embed badge

[![health](https://opensources.dev/resource/scrapy/badge.svg)](https://opensources.dev/resource/scrapy)

BSD-3-ClausePermissive — free to use in commercial and proprietary software, with attribution.View license →

Production readiness

4/5

Actively maintainedCommits in the last 6 months
No known vulnerabilitiesNot yet scanned
Clear, usable licenseBSD-3-Clause (permissive)
Proven adoptionWidely used
Has documentationDocumentation indexed

Install

pip install scrapy

MaintenanceActively maintainedIs scrapy still maintained? →

Our analysis

Scrapy is a complete, batteries-included framework for writing web spiders that crawl sites, parse HTML/XML, and emit structured data. Its core idea is an event-driven asynchronous engine (historically built on Twisted) wrapped in a declarative spider/pipeline/middleware architecture.

When to use scrapy

Reach for Scrapy when you need to crawl many pages at scale, follow links across a site, handle throttling/retries/concurrency, and pipe results into files or databases. It shines for recurring, large-scale extraction jobs where its middlewares (cookies, proxies, robots.txt, autothrottle) and item pipelines save substantial boilerplate.

When not to

For a one-off scrape of a single page, plain requests + BeautifulSoup is simpler. For heavily JavaScript-rendered sites, a browser-automation tool like Playwright or Selenium (or Scrapy plus a headless-browser plugin) fits better, since Scrapy does not execute JS out of the box.

Strengths

Highly scalable async crawling with built-in concurrency, throttling, and retry handling
Rich ecosystem: middlewares, extensions, item pipelines, feed exports, and selectors (XPath/CSS) in one package
Battle-tested with extensive documentation and a large community
Clear separation of concerns via spiders, items, and pipelines that keeps large projects maintainable

Trade-offs

No native JavaScript rendering; dynamic sites require extra tooling like Splash or scrapy-playwright
Twisted-based architecture has a learning curve and can feel heavy for small tasks
Project structure and conventions add overhead compared to a quick script
Async model can clash with code expecting standard asyncio/blocking patterns

Maturity

Very mature and actively maintained, with 60k+ GitHub stars, a stable release cadence, a broad plugin ecosystem, and commercial backing from Zyte. It is widely used in production for serious scraping workloads.

Documentation from scrapy, shown under BSD-3-Clause with attribution. Source: https://scrapy.org

View on GitHub

scrapy

Python BSD-3-Clause★ 62,316

100

Health 100/100

Baseline50
Commit recency+25
Popularity+20
Release cadence+10
Open-issue load+5

asyncio data-extraction python spiders twisted web-crawler web-scraping xpath-css-selectors

Raw .md

Embed badge

[![health](https://opensources.dev/resource/scrapy/badge.svg)](https://opensources.dev/resource/scrapy)

BSD-3-ClausePermissive — free to use in commercial and proprietary software, with attribution.View license →

Production readiness

4/5

Actively maintainedCommits in the last 6 months
No known vulnerabilitiesNot yet scanned
Clear, usable licenseBSD-3-Clause (permissive)
Proven adoptionWidely used
Has documentationDocumentation indexed

Install

pip install scrapy

MaintenanceActively maintainedIs scrapy still maintained? →

Our analysis

When to use scrapy

When not to

Strengths

Highly scalable async crawling with built-in concurrency, throttling, and retry handling
Rich ecosystem: middlewares, extensions, item pipelines, feed exports, and selectors (XPath/CSS) in one package
Battle-tested with extensive documentation and a large community
Clear separation of concerns via spiders, items, and pipelines that keeps large projects maintainable

Trade-offs

No native JavaScript rendering; dynamic sites require extra tooling like Splash or scrapy-playwright
Twisted-based architecture has a learning curve and can feel heavy for small tasks
Project structure and conventions add overhead compared to a quick script
Async model can clash with code expecting standard asyncio/blocking patterns

Maturity

Documentation from scrapy, shown under BSD-3-Clause with attribution. Source: https://scrapy.org

View on GitHub

Production readiness

Our analysis

When to use scrapy

When not to

Strengths

Trade-offs

Maturity

Alternatives see all →

More in Frameworks

vue

andrej-karpathy-skills

next.js

react-native

immich

fastapi

Frequently asked questions

Production readiness

Our analysis

When to use scrapy

When not to

Strengths

Trade-offs

Maturity

Alternatives see all →

More in Frameworks

vue

andrej-karpathy-skills

next.js

react-native

immich

fastapi

Frequently asked questions