Vlad Fedorkov

Performance consulting for MySQL and Sphinx

PHP Crawler

quick installation : screenshots : download php-crawler

PHP-Crawler is a very simple crawl/search script with fulltext support for small websites. Simple, based on PHP and MySQL. No shell access required, crawling can be run from browser. Created ages ago (back in year 2006) it stays one of the most popular php crawler scripts in the world.

Features

  • Full text indexing
  • Crawling is limited by depth setting
  • Safe spidering: allow to limit maximum page size
  • Following “href=” links on web page, in HTML or JavaScripts
  • MySQL based
  • Simple installation

Requirements

  • PHP 4.3.10+
  • MySQL 3.23.56+

Distribution
Last version available on SourceForge under terms of BSD Licence.

Download php-crawler now.

  • sunel says:

    hi your php crawler was very useful for our small project but i need help ,this works within my localhost only i need to make it work int entire web ….i look forward for u help please thank u in advance …..

    December 15, 2011 at 6:35 pm
    • vlad says:

      You may want to set $CRAWL_ENTRY_POINT_URL in config file pointing out of your localhost (for 0.7.7-alpha), but please note, that PHP-crawler is not designed to crawl the entire web :)

      December 16, 2011 at 10:00 am
  • Vikram says:

    Your Crawler is superb man, i want to know the algorithm u hav used in ths to crawl. The algorithm used to search. N hw to use ths to crawl multiple sites at a time :) Thanks in advance

    February 9, 2012 at 11:35 am
  • Buttonator says:

    Hi!

    Some dirs/file are missing from the package (tpl/elt/head.php; tpl/top/table.php; tpl/bot/html.php). As I seen in config, they must be created with right path, but which is the content of them?

    February 25, 2012 at 11:55 pm
  • Johnny Wunder says:

    I really like phpCrawler gives we exactly what I want in terms of a lightweight crawler I can point at whatever web site I want to analyze but I seem to be misinterpreting the use the the $CRAWL_PAGE_EXPIRE_DAYS parameter. On line 39 within function markOldURLsToCrawl of my version of _crawler.php it checks to see if the crawl time has expired and needs to be recrawled but then regardless of the results it deletes words on line 40 which causes the search to no longer work for the follow-on searches until the site is recrawled. That doesn’t seem right to me? Do I have a good version and am I interpreting it right?
    Johnny

    April 1, 2012 at 6:02 pm
    • Jobeth says:

      I had no idea how to approach this beoefo-nrw I’m locked and loaded.

      August 20, 2014 at 3:15 am
  • Edward says:

    Small enhancement to crawler.sql script on Sourceforge:

    create table phpcrawler_links () ENGINE = MYISAM;

    otherwise, freetext index will fail

    April 16, 2012 at 7:48 am
    • vlad says:

      Edward, good catch, thank you!

      October 16, 2012 at 11:33 am

    Pingback/Trackback

    Crawling | My CMS

  • mekix says:

    me gusta mucho su crawler! gracias por crearlo, saludos desde Perú

    November 8, 2012 at 9:11 pm
  • NikoS says:

    Hello , my question is:With php-crawler can index pdf or doc files?
    Thanks

    November 21, 2012 at 5:59 am
  • Roylee says:

    how to index a website the path u gave to start crawl it redirects to search.php / home page quick reply is appreciated thanks !

    April 27, 2013 at 8:23 am
  • uche umeevuruo says:

    Please, the crawler does not crawl my site. Please how do I rectify this issue?

    September 21, 2013 at 7:58 pm
  • Marvin Hand says:

    1. I love it and Thanks
    2. You should go over these codes again.

    November 10, 2013 at 10:21 pm
  • Dharav Samani says:

    Where the content of crawled web pages are stored????
    Can crawler gives the flexibility to extract only the user comments from the entire webpage?
    Which other parameters can we change such as CRAWL_DEPTH, $CRAWL_PAGE_EXPIRE_DAYS,etc?

    January 13, 2014 at 8:32 am
  • Samantha says:

    I see you share interesting things here, you can earn some additional money, your blog has huge potential, for the
    monetizing method, just search in google – K2 advices how to monetize
    a website

    August 11, 2014 at 9:17 am

Your email address will not be published. Required fields are marked *

*