Title: Apache log file parser class Marked Cool (Review this resource) Author: kaklz Posted On: 2004-12-22 Category: Home > PHP Classes
Popularity:
Description: You can use this class for creating your own Apache log file based statistics system. The source code offers you a great speed, as of my tests up to 3000 or even 4000 parsed lines per second!
Total Hits: 8508 Total Votes: 3
Total Points: 9 (3 reviews) [ Download ]
Page Navigation: [1]
Logparser class
What is Logparser class
This class was written for parsing apache log files. In general, all it does is parses the
log file and counts the number of lines parsed and amount of traffic that has been transfered. There
are no statistics features added, as almost each and every programmer needs his own functions, calculations
and general approach for building statistics.
Where can you use this class?
For example, you can build a statistics system on MySQL or any other type of database and don't
count the statistics live, but parse the apache log file once a day and add all the records to database.
What about incremental log file parsing?
This class does not limit you in any way - it just knows how to parse the records in log file. How you
read the file, it's up to you. By the way, if you are interested, I can provide you with an example how to
parse the log files incrementally. All you need to do is to store the parsed offset in let's say text file,
and when you resume the parsing, just skip the bytes that are before the offset border:
<?php
// function for checking parse time
function microtime_float(){
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
$time_start = microtime_float();
// set time limit to infinite, as we don't know how long will it take to parse the file
set_time_limit(0);
require_once('logparser.php');
$logParser = new ApacheLogParser();
// log file
$fp = fopen('/var/www/html/ltv/logs/access.log', 'r');
// if offset is saved, read it from file, if not, offset is 0
if (file_exists('offset.txt')){
$offset = (int)file_get_contents('offset.txt');
}else{
$offset = 0;
}
// incremental log parsing
fseek($fp, $offset);
while ($data = fgets($fp, 4096)){
$logInfo = $logParser -> parse($data);
// perform the needed actions with $logInfo array
}
// save the current offset and save to offset.txt
$pos = ftell($fp);
$fp2 = fopen('offset.txt', 'w');
fwrite($fp2, $pos);
fclose($fp2);
fclose($fp);
$time_end = microtime_float();
$time = $time_end - $time_start;
// output some info
echo "<p>Parsed {$logParser -> rowsParsed} rows in $time seconds.</p>";
echo '<p>Total ' . $logParser -> bytesTransfered . ' bytes have been transfered!</p>';
?>
P.S. a little advertisement for a function in php-editors.com contest - you can use Bytes Scale function for formatting the number of bytes transfered.
What about speed?
From the tests I've done, the speed was about 3000 to 4000 lines of log file per second. I know this
is not as fast as webalizer or awstats, but if you parse the log files regularly, it's enough for a medium
size web site.
Source code
<?php
// class for parsing apache log files
Class ApacheLogParser{
// number of rows parsed
var $rowsParsed;
// number of bytes transfered
var $bytesTransfered;
// main function for parsing the lines of log file
function parse($data){
$pattern = "/(([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\ ([\-])\ ([^\ ]+)\ \[([0-9]+\/[a-zA-Z]+\/[0-9]
+:[0-9]+:[0-9]+:[0-9]+\ [\+\-0-9]+)\]\ \"(GET\ [^\"]+|POST\ [^\"]
+)\"\ ([0-9]+)\ ([0-9]+)\ \"([^\"]+)\"\ \"([^\"]+)\")/";
$matches = array();
if (preg_match($pattern, $data, $matches)){
$request['ip'] = $matches[2];
$request['username'] = $matches[4];
$request['time'] = $this -> parseTime($matches[5]);
$request['date'] = $this -> parseDate($matches[5]);
$request['http'] = $matches[6];
$request['code'] = $matches[7];
$request['size'] = $matches[8];
$request['referer'] = $matches[9];
$request['useragent'] = $matches[10];
}
$this -> rowsParsed++;
$this -> bytesTransfered += $request['size'];
return $request;
}
// function to parse date into Y-m-d format, you can also edit this to parse the date into unix timestamp
function parseDate($date){
if (empty($date)){
trigger_error('Date empty!', E_USER_WARNING);
}
list($d, $M, $y, $h, $m, $s, $z) = sscanf($date, "%2d/%3s/%4d:%2d:%2d:%2d %5s");
return date('Y-m-d', strtotime("$d $M $y $h:$m:$s $z"));
}
// function to parse time in H:i:s format
function parseTime($date){
list($d, $M, $y, $h, $m, $s, $z) = sscanf($date, "%2d/%3s/%4d:%2d:%2d:%2d %5s");
return date('H:i:s', strtotime("$d $M $y $h:$m:$s $z"));
}
}
?>
What's in download.zip?
The source of this class and sample for incremental log parsing. Actually you can see all the source code here.
About author
Ingus Rukis, http://www.wisecms.com, ingus.rukis@gmail.com. I've been working as a PHP programmer for a while (some 3 years, I guess) and on my spare time I enjoy writing some simple and easy tutorials for beginners. I mostly write in Latvian, that's my mother language, so if you notice some spelling or grammar mistakes, please excuse me for that.
P.S. if you found this piece of code useful, please feel free to leave comments or vote.
Page Navigation: [1]
|