homeblogtags
  • Published on
    October 30, 2024 (1mo ago)

    27.6% of the Top 10 Million Sites are Dead

    internet-decaydomain-analysisinactive-websitestop-domainsweb-crawlerkubernetes
    An analysis of the top 10 million websites reveals that over a quarter are inactive, highlighting the web's shifting landscape. Using a high...
  • Published on
    October 14, 2023 (1y ago)

    Web Crawling at Scale: Navigating Billions of URLs with Efficiency

    KubernetesWeb-CrawlerGolangNodejsDistributed-System
    Dive into the world of distributed web crawling with Golang, Docker, and Redis. Learn the logic behind efficient code, use Bloom filters for...
  • Published on
    October 13, 2023 (1y ago)

    The Architecture of a Web Crawler: Building a Google-Inspired Distributed Web Crawler. Part 1

    KubernetesWeb-CrawlerGolangNodejsDistributed-System
    Unlock the potential of the web with a Google-inspired distributed web crawler. Explore scalable solutions using Kubernetes, Golang, Python,...
  • Published on
    June 11, 2023 (1y ago)

    A Step-by-Step Guide to Building a Scalable Distributed Crawler for Scraping Millions of Top TikTok Profiles

    KubernetesWeb-CrawlerGolangNodejsDistributed-System
    Embark on a comprehensive journey to construct a powerful TikTok scraper using Golang, Docker, and Kubernetes. Gain insights into website an...
  • Published on
    February 28, 2017 (7y ago)

    How to build a scalable crawler to crawl million pages with a single machine in just 2 hours

    DockerDevOpsWeb-CrawlerDistributed-Crawler
    Learn to build a scalable Python web crawler using Docker, Celery, and RabbitMQ. No multiprocessing knowledge needed. Effortlessly scale wit...
© 2024 Built with 💖 by Tony Wang • With TypeScript, Next.js, Tailwind • Inspired by Leerob
Support me • Contact me