Thursday, September 6, 2007

SolutionBase: Watch Web site activity with Webalizer

Takeaway: Do you know who is visiting your Web site, and when? A good Web admin needs to know these statistics. Webalizer is a reliable application that can help you analyze your HTTP servers' traffic, keeping you on top of your sites and how they are being used. In this article, Jack Wallen will take a closer look at Webalizer and how to use it.


You probably take for granted that your Web site is always up and that people are actually visiting it. But are they? If they are, do you actually know where your visitors are coming from, what their referrer was, or what browser they were using? Do you know what the top pages of your site are? How about your top entry and exit pages?


These are the kind of statistics that a good Web admin needs to know. But before you start combing through log files, consider installing Webalizer . Started as a simple Perl script, Webalizer has grown into something far more useful. Webalizer is now a very fast, reliable application that reads your server log files and places them in a user-friendly format that can help you analyze your HTTP servers' traffic, keeping you on top of your sites and how they are being used. In this article, I'll show you what exactly Webalizer is and how to use it.


Installing Webalizer


Webalizer can be installed in many different ways. I am working on a Fedora 7 environment, so the best means for me to install is via yum . Of course, there are dependencies to be met; Webalizer depends upon the gd graphics library so you will need to install gd . If you are running a Fedora (or any distribution that relies on yum ), this can be done with the command yum install gd . Once that is complete, you can continue to install Webalizer. To finish up the installation, run the command yum install Webalizer to get the application installed.


If you are not using a yum -based distribution, or you'd prefer to install via source, the process isn't nearly as simple. Nevertheless, you will still have to get gd installed. Grab a copy of the gd source , unpack the archive (using the tar xvzf gd-2.0.35.tar.gz command), move into the gd directory, and run the usual set of commands to compile source:


./configure

make

make install


With gd installed, you're ready to install Webalizer. First, download a copy of the Webalizer source . Unpack the archive using the tar xvzf webalizer-2.01-10-src.tgz command. The next step is to move to the source directory newly created by the tar command. Once inside the source directory, run the same compiling commands you used earlier.


Up and running ... almost


Since Webalizer is running, you're probably assuming you should point your browser to http://web_server_add/webalizer/ to see what you have. If you do, the only thing you'll see is:


Not Found

The requested URL /webalizer/ was not found on this server.


What went wrong?


After I installed the application, it took me a while to finally locate where the Webalizer folder had been installed to. I have no idea why the rpm installed Webalizer where it did; but, nestled in /var/lib sat my Webalizer folder. After making a backup of the /var/lib/webalizer directory (using the tar cfz webalizer.tgz /var/lib/webalizer command) I decided to move the /var/lib/webalizer directory to /var/www/html .


With the directory in its proper place, I ran -- as root -- the command to start Webalizer, which is simply webalizer . After running the command, I received the error:


Using logfile /var/log/httpd/access_log (clf)

Error: Can't change directory to /var/lib/webalizer


Before I panicked, I looked for a configuration file; inside of /etc/ was the webalizer.conf file, ready to be edited. Naturally, before I moved on to any further configurations, I needed to see that Webalizer was up and running properly. Taking a look inside the /etc/webalizer.conf file, there is a line:


OutputDir /var/lib/webalizer


Since I moved the Webalizer directory, the system can no longer find the directory to send its output to. That's pretty easy to fix. Open up the webalizer.conf file in your favorite text editor, and change that line to:


OutputDir /var/www/html/webalizer


(where /var/www/html is your Web servers' document root) and re-run the command. This time, you should see something like this scroll by:


Webalizer V2.01-10 (Linux 2.6.21-1.3228.fc7) English

Using logfile /var/log/httpd/access_log (clf)

DNS Lookup (10): 1 addresses in 5.25 seconds

Using DNS cache file dns_cache.db

Creating output in /var/www/html/Webalizer

Hostname for reports is 'localhost.localdomain'

Reading history file... webalizer.hist

Generating report for June 2007

Generating summary report

Saving history information...

1087 records in 0.09 seconds


If you point your browser to http://server_address/webalizer now, you should see a screen similar to Figure A .



Figure A



The Webalizer opening screen gives you a yearly summary in a simple-to-read graph.


Now when you select the month (in the lower table) you will be directed to that month's statistical breakdown. The monthly breakdown is incredibly detailed:



  • Per Month : Total Hits, Total Files, Total Pages, Total Visits, Total Kbytes, Total Unique Sites, Total Unique URLs, Total Unique Referrers, Total Unique User Agents

  • Avg /Max : Hits per Hour, Hits per Day, Files per Day, Pages per Day, Visits per Day, KBytes per Day

  • Hits by Response Code

  • Daily Usage : Shown in Figure B

  • Daily Statistics : Hits, Files, Pages, Visits, Sites, Kbytes

  • Hourly Usage : Shown in Figure C

  • Hourly Statistic s: Avg/Total Hits, Files, Pages, Kbytes

  • Top URLs

  • Top URLs By Kbytes

  • Top Entry Pages: Shown in Figure D

  • Top Exit Pages

  • Top Sites

  • Top Sites by Total Kbytes

  • Top Referrers

  • Top User Agents

  • Usage By Country

  • Top Countries



Figure B



This shot shows, at a glance, which days are generating the highest traffic.



Figure C



This shot illustrates how much detail the Webalizer system gives you.



Figure D



This shot gives you an idea how Webalizer can help you analyze where your traffic is primarily coming into and leaving from.


Now that you have Webalizer up and running, let's take a look at some of the configuration options available.


Configuring Webalizer


One of the first things to do is set Webalizer up to run at a regular interval. The best solution is to create a cron job that will run Webalizer daily. To do this, create a new file -- webalizer.cron -- with the following contents:


#! /bin/sh

/usr/bin/webalizer


and place it in /etc/cron.daily . Now, make this file executable with the command: chmod +x /etc/cron.daily/webalizer.cron . You can test your new cron job by running the command /etc/crond.daily/webalizer.cron . You should get the same output you did when you ran the webalizer command on its own.


You can customize Webalizer by making changes to its configuration file. Remember, the configuration file is /etc/webalizer.conf . Some of the configuration options you will want to deal with include:



  • LogType : This option defines the type of log file used. The types allowed are: clf (default), ftp ( xferlogs produced by wu -ftp ), or squid (native squid logs).

  • OutputDir : As described above, this is where the Webalizer will place its output.

  • HistoryName : This allows you to define the name of the history file produced. This file keeps data for up to twelve months and by default it is called webalizer.hist .

  • Incremental : If you run a larger site, you will want to enable this. Incremental processing allows you to set up multiple partial log files instead of one large file. The default is no.

  • IncrementalName : If you enable Incremental, you will want to check out this option (if you do not enable Incremental, ignore this option). The default name is webalizer.current . This file will store the most recent report data.

  • ReportTitle : This is the text displayed as the title of the report.

  • HostName : This defines the hostname used on the report. This hostname is the name used on the clickable entries within the report. If you change this, make sure it is correct. The default is localhost. Localhost, of course, will only work if you are viewing the report on the server running Webalizer.

  • HTMLExtension : This allows you to define the file extension to use when creating the HTML pages. The default is .html.

  • PageType : This defines, for Webalizer, what URLS you (or your system) consider a page. The defaults are htm * and cgi .

  • UseHTTPS : This is employed if Webalizer is deployed on a secure server.

  • DNSCache : Here is where you specify your DNS cache file. This file is used for reverse DNS lookups. The default is dns_cache.db .

  • DNSChildren : This is where you can define how many child processes may be used when performing DNS lookups. Standard values are between 5 and 20 with 10 being the default.

  • HTMLPre : This allows you to define any HTML code to insert at the beginning of the file. The default is DOCTYPE.

  • HTMLHead : This allows you to define any HTML code to insert between the tags.

  • HTMLBody : This allows you to define any HTML code inserted within the tag.

  • HTMLPost : This allows you to define any HTML code immediately before the first of the page.

  • HTMLTail : This allows you to define any HTML code at the bottom of each HTML document.

  • HTMLEnd : This allows you to define any HTML code to add at the very bottom of each HTML document.

  • Quiet : This option suppresses any output messages. If you are running Webalizer from a cron job it is best to use this option.

  • ReallyQuiet : This option will suppress all messages, including warnings.

  • TimeMe : This option will force Webalizer to show the timing information at the end of processing.

  • GMTTime : All reports will be shown in GMT (UTC) time.

  • Debug : Prints additional information within error messages.

  • FoldSeqErr : If set to yes, Webalizer will ignore sequence messages.

  • VisitTimeout : This allows you to set the default timeout for a visit. Default is 1800 seconds.

  • IgnoreHist : This option really shouldn't be used. If used, it will cause Webalizer to ignore the history file.

  • CountryGraph : This allows you to enable or disable the Country Graph. Default is yes (enabled).

  • DailyGraph/DailyStats : These allow you to enable or disable the Daily Graph and Daily Stats. Defaults are yes (enabled).

  • HourlyGraph/HourlyStats : These allow you to enable or disable the Hourly Graph and Hourly Stats. Defaults are yes (enabled).

  • GraphLegend : This allows you to enable the color-coded legends for all graphs. Default is yes.

  • GraphLines : This allows you to enable the lines used to make the graphs more easily readable. The value of the option is in a number; the lower the number the better. The default is 2.

  • Top Options : These options set the number of entries for each table. You can define these to fit your needs. The options are: TopSites, TopkSites, TopURLs, TopKURLs, TopReferrers, TopAgents, TopCountries, TopEntry, TopExit, TopSearch, and TopUsers.

  • All Options : These keywords enable the display of all URL's, Sites, Referrers, User Agents, Search Strings, and Usernames. When these are enabled each will have their own HTML page created. If these options are enabled there must first be more items than will fit in the Top tables and the listing will only show those items that are normally visible. The options are: AllSites, AllURLs, AllReferrers, AllAgents,AllSearchStr, and AllUsers.

  • IndexAlias : Using this feature will strip the need for the string index.html from an address. In otherwords /directory/index.html can be used as only /directory/.

  • Ignore* : This keyword will cause Webalizer to ignore records.

  • Hide* : This keyword will prevent items from being displayed in the Top tables but will be included in the main totals.

  • Group* : This keyword groups similar objects together.

  • Include* : This keyword allows you to include log records based on hostname, URL, user agent, referrer, or username.

  • SearchEngine : Allows you to define search engines and their query strings that are used to find your site. An example: SearchEngine google.com q=

  • Dump* : These keywords allow sites, URLs, Referrers, User Agents, Usernames, and Search Strings to be dumped into a tab-delineated text file that can be used in database applications.


Final thoughts


I have used Webalizer with many sites. The information is displays is informative, easy to read, and will help you in the analysis of your Web sites. If you're looking for one of the best and your budget points you to open source, Webalizer is the perfect tool for your needs.

No comments:

About Me

Ordinary People that spend much time in the box
Powered By Blogger