PHP Tutorials and Scripts   




Title: Apache log file parser class    Marked Cool    (Review this resource)
Author: kaklz
Posted On: 2004-12-22
Category: Home > PHP Classes

Popularity: 5 points out of 10    

Description: You can use this class for creating your own Apache log file based statistics system. The source code offers you a great speed, as of my tests up to 3000 or even 4000 parsed lines per second!

Total Hits: 8508     Total Votes: 3     Total Points: 9 (3 reviews)        [ Download ]   

Page Navigation:  [1]


Logparser class

What is Logparser class

This class was written for parsing apache log files. In general, all it does is parses the log file and counts the number of lines parsed and amount of traffic that has been transfered. There are no statistics features added, as almost each and every programmer needs his own functions, calculations and general approach for building statistics.

Where can you use this class?

For example, you can build a statistics system on MySQL or any other type of database and don't count the statistics live, but parse the apache log file once a day and add all the records to database.

What about incremental log file parsing?

This class does not limit you in any way - it just knows how to parse the records in log file. How you read the file, it's up to you. By the way, if you are interested, I can provide you with an example how to parse the log files incrementally. All you need to do is to store the parsed offset in let's say text file, and when you resume the parsing, just skip the bytes that are before the offset border:


<?php
    // function for checking parse time
    function microtime_float(){
        list($usec, $sec) = explode(" ", microtime());
        return ((float)$usec + (float)$sec);
    }

    $time_start = microtime_float();

    // set time limit to infinite, as we don't know how long will it take to parse the file
    set_time_limit(0);
    require_once('logparser.php');
    $logParser = new ApacheLogParser();
    
    // log file
    $fp = fopen('/var/www/html/ltv/logs/access.log', 'r');

    // if offset is saved, read it from file, if not, offset is 0
    if (file_exists('offset.txt')){
        $offset = (int)file_get_contents('offset.txt');
    }else{
        $offset = 0;
    }

    // incremental log parsing
    fseek($fp, $offset);
    while ($data = fgets($fp, 4096)){
        $logInfo = $logParser -> parse($data);
        
        // perform the needed actions with $logInfo array
    }
    
    
    // save the current offset and save to offset.txt
    $pos = ftell($fp);
    $fp2 = fopen('offset.txt', 'w');
    fwrite($fp2, $pos);
    fclose($fp2);
    fclose($fp);
    
    $time_end = microtime_float();
    $time = $time_end - $time_start;

    // output some info
    echo "<p>Parsed {$logParser -> rowsParsed} rows in $time seconds.</p>";
    echo '<p>Total ' . $logParser -> bytesTransfered . ' bytes have been transfered!</p>';
?>

P.S. a little advertisement for a function in php-editors.com contest - you can use Bytes Scale function for formatting the number of bytes transfered.

What about speed?

From the tests I've done, the speed was about 3000 to 4000 lines of log file per second. I know this is not as fast as webalizer or awstats, but if you parse the log files regularly, it's enough for a medium size web site.

Source code


<?php

    // class for parsing apache log files
    Class ApacheLogParser{
        // number of rows parsed
        var $rowsParsed;
        // number of bytes transfered
        var $bytesTransfered;
        
        // main function for parsing the lines of log file
        function parse($data){
            $pattern = "/(([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\ ([\-])\ ([^\ ]+)\ \[([0-9]+\/[a-zA-Z]+\/[0-9]
                  +:[0-9]+:[0-9]+:[0-9]+\ [\+\-0-9]+)\]\ \"(GET\ [^\"]+|POST\ [^\"]
				  +)\"\ ([0-9]+)\ ([0-9]+)\ \"([^\"]+)\"\ \"([^\"]+)\")/";
            $matches = array();
            if (preg_match($pattern, $data, $matches)){
                $request['ip'] = $matches[2];
                $request['username'] = $matches[4];
                $request['time'] = $this -> parseTime($matches[5]);
                $request['date'] = $this -> parseDate($matches[5]);
                $request['http'] = $matches[6];
                $request['code'] = $matches[7];
                $request['size'] = $matches[8];
                $request['referer'] = $matches[9];
                $request['useragent'] = $matches[10];
            }
            $this -> rowsParsed++;
            $this -> bytesTransfered += $request['size'];
            return $request;
        }
        
        // function to parse date into Y-m-d format, you can also edit this to parse the date into unix timestamp
        function parseDate($date){
            if (empty($date)){
                trigger_error('Date empty!', E_USER_WARNING);
            }
            list($d, $M, $y, $h, $m, $s, $z) = sscanf($date, "%2d/%3s/%4d:%2d:%2d:%2d %5s");
            return date('Y-m-d', strtotime("$d $M $y $h:$m:$s $z"));
        }
        
        // function to parse time in H:i:s format
        function parseTime($date){
            list($d, $M, $y, $h, $m, $s, $z) = sscanf($date, "%2d/%3s/%4d:%2d:%2d:%2d %5s");
            return date('H:i:s', strtotime("$d $M $y $h:$m:$s $z"));
        }
    }
?>

What's in download.zip?

The source of this class and sample for incremental log parsing. Actually you can see all the source code here.

About author

Ingus Rukis, http://www.wisecms.com, ingus.rukis@gmail.com. I've been working as a PHP programmer for a while (some 3 years, I guess) and on my spare time I enjoy writing some simple and easy tutorials for beginners. I mostly write in Latvian, that's my mother language, so if you notice some spelling or grammar mistakes, please excuse me for that.

P.S. if you found this piece of code useful, please feel free to leave comments or vote.



Page Navigation:  [1]



© Copyright 2003-2008 www.php-editors.com. The ultimate PHP Editor and PHP IDE site.