Vlad Fedorkov

Performance consulting for MySQL and Sphinx

PHP Crawler

quick installation : screenshots : download php-crawler

PHP-Crawler is a very simple crawl/search script with fulltext support for small websites. Simple, based on PHP and MySQL. No shell access required, crawling can be run from browser. Created ages ago (back in year 2006) it stays one of the most popular php crawler scripts in the world.

Features

  • Full text indexing
  • Crawling is limited by depth setting
  • Safe spidering: allow to limit maximum page size
  • Following “href=” links on web page, in HTML or JavaScripts
  • MySQL based
  • Simple installation

Requirements

  • PHP 4.3.10+
  • MySQL 3.23.56+

Distribution
Last version available on SourceForge under terms of BSD Licence.

Download php-crawler now.

  • sunel says:

    hi your php crawler was very useful for our small project but i need help ,this works within my localhost only i need to make it work int entire web ….i look forward for u help please thank u in advance …..

    December 15, 2011 at 6:35 pm
    • vlad says:

      You may want to set $CRAWL_ENTRY_POINT_URL in config file pointing out of your localhost (for 0.7.7-alpha), but please note, that PHP-crawler is not designed to crawl the entire web :)

      December 16, 2011 at 10:00 am
  • Vikram says:

    Your Crawler is superb man, i want to know the algorithm u hav used in ths to crawl. The algorithm used to search. N hw to use ths to crawl multiple sites at a time :) Thanks in advance

    February 9, 2012 at 11:35 am
  • Edward says:

    Small enhancement to crawler.sql script on Sourceforge:

    create table phpcrawler_links () ENGINE = MYISAM;

    otherwise, freetext index will fail

    April 16, 2012 at 7:48 am
    • vlad says:

      Edward, good catch, thank you!

      October 16, 2012 at 11:33 am

    Pingback/Trackback

    Crawling | My CMS

  • mekix says:

    me gusta mucho su crawler! gracias por crearlo, saludos desde Perú

    November 8, 2012 at 9:11 pm
  • NikoS says:

    Hello , my question is:With php-crawler can index pdf or doc files?
    Thanks

    November 21, 2012 at 5:59 am
  • Hester says:

    With havin so much content do you ever run into any problems of plagorism or copyright violation?
    My blog has a lot of unique content I’ve either written myself or outsourced but it looks like a lot of it is popping it up all over the web without my permission. Do you know any methods to help prevent content from being ripped off? I’d really appreciate it.

    January 18, 2013 at 6:04 am
  • http://blog.iseverance.com/ says:

    Thank you for another informative web site. Where else could I get
    that kind of information written in such an ideal
    way? I have a project that I’m just now working on, and I have been on the look out for such information.

    April 17, 2013 at 2:27 am
  • 117838001013879748561/about?gl=PL&hl=pl-PL says:

    Appreciation to my father who shared with me regarding this website, this weblog is actually remarkable.

    April 21, 2013 at 4:23 pm
  • Roylee says:

    how to index a website the path u gave to start crawl it redirects to search.php / home page quick reply is appreciated thanks !

    April 27, 2013 at 8:23 am
  • relevant resource site says:

    Hi i am kavin, its my first time to commenting anywhere, when i read this
    post i thought i could also make comment due to this good
    piece of writing.

    May 9, 2013 at 4:28 am
  • treating toenail fungus says:

    Greetings! Very helpful advice in this particular post!
    It’s the little changes that produce the largest changes. Thanks a lot for sharing!

    May 11, 2013 at 8:22 pm

Your email address will not be published. Required fields are marked *

*