| Hosting
Articles Index
|
 |
| How
do you know how many
people have visited
your site? How do
you know which pages
they view? And what
is meant by unique
visitors, hits, page
views and raw log
access. This article
seeks to explain these
terms to you and provide
an understanding of
these terms so you
will have a better
grasp of what is happening
on your site. We will
first start with defining
the terms used in
judging site statistics
and then move on to
analyzing how they
are used. The first
term defined is raw
log access because
it serves as the foundation
for defining the rest
of these terms and
is the foundation
for all site statistics. |
 |
| Raw
log access is a term
used to describe a
file which contains
a record of all of
the visits by people
to your site. When
a person visits your
site what they do
is request the web
page from your site.
This request is sent
by the web browser
to the server and
then in response the
server sends the web
page back to the browser
so it can be displayed
on the computer. But
in reality when a
browser sends this
request it is not
just "send the
web page". A
web page is actually
broken up into many
different pieces of
information. For example,
if you have four pictures
on your web page the
browser will send
five different requests
to the server; one
for the web page and
then one request for
each picture. |
 |
| The
reason why this is
important is that
each of these requests
are logged separately
in the raw log access
file. You also might
know these requests
as they are commonly
called "hits".
When someone says
they have one million
hits on their web
site it means they
have had one million
requests for web pages
AND those pieces of
the web pages. Now
the term hits used
to be a fairly common
term for the measure
of how popular a site
was. This is not a
good measurement for
how busy a site is.
The reason is that
if you have one web
page with 100 pictures
one visitor who wants
to see this web page
it will log 101 hits
in the access file
(one for the web page
and one for each picture
on the web page).
If this web page had
five visitors it would
log 505 hits. It sounds
like a lot but there
were only five people
viewing the page.
Now compare this to
a page which has just
one picture and text
on it. Each person
viewing it would log
only two hits (one
for the page and one
for the picture).
If this second page
had 100 people visiting
the page the hit count
would only be 200.
It looks like on the
face of it the first
page has more traffic
but in reality it
doesn't. This is why
the amount of hits
a site receives doesn't
mean a lot because
the number of hits
it receives is totally
dependant on how the
site is developed.
So what is a better
method to compare
two sites? |
 |
| The
next most commonly
used term is "page
views". Page
views is a term which
seeks to compare apples
with apples so you
can evaluate web site
pages with each other
independent of how
the web page is constructed.
This term basically
says that there were
x number of requests
for a web page in
a certain amount of
time. It does this
by going through the
raw log access file
and looking for just
the requests for the
web pages themselves
and by ignoring all
the rest. That is,
ignoring how many
hits there were-just
tell me how many requests
there were for the
web page. This is
a good indicator of
how many times a particular
web page is viewed
or how many particular
pages were viewed
on a site. Looking
at it on the face
this is a good indicator,
but it also has its
problems. For example,
let us look at our
example with the two
web pages. In the
first example the
number of hits the
first site had was
505; the second, 200.
If you look at the
page views for these
two sites the count
would be as follows:
for the first site,
5; for the second,
100. A much better
indicator but the
problem which naturally
occurs is what happens
when you have one
person who views one
page leaves then comes
back and views the
same page again? Or
what happens when
one person refreshes
the web page in the
browser. Each time
this happens another
web page request is
logged. So you could
have ten repeat people
viewing a site a couple
times a month. Or
you could have one
person who looks at
your site once a day.
How can you tell who
has the most people
visiting? You can't
from page views. With
page views you get
duplication which
is unaccounted for
when viewing the results.
So as a measure of
overall traffic it
is good, but for a
detailed analysis
of how people are
coming to your site
it is ineffective. |
 |
| We
have went over two
different methods
so far of evaluating
web site traffic but
there is one more
method used currently
which is independent
or unique visitors.
When a request is
logged in the raw
log access file it
might look like this: |
 |
| 198.162.0.1
- - [09/Jul/2003:15:30:19
-0400] "GET /index.htm
HTTP/1.1" 200
- "http://www.Navicosoft.com/index.htm" |
 |
| The
first part of the
line is the IP address
of the computer which
requested the web
page. At the end we
can see that they
requested the index.htm
file. We can also
see the date and time
of the request in
the square brackets.
We can also see that
the code "200"
(a successful request
and web page sent)
was logged. (If the
web page was not found
you would see an error
code of 404 which
means the web page
wasn't found. Another
popular code is 304
which means that the
web page has not been
modified since the
visitor last visited.)
The important part
is that whenever the
computer located at
the IP address of
198.162.0.1 requested
a web page from the
site their IP address
was listed and logged.
If you went through
the entire raw log
access file and counted
up the "different"
IP addresses you could
find out a number
of unique IP addresses
and have a rough estimate
of the number of individual
people who visited
your site. Counting
these IP addresses
will tell you how
many "different"
people visited your
site. So if we went
back to our examples
and counted the number
of different people
who visited we might
find that the first
site might have more
unique people visiting
its site than the
second one. |
 |
| But,
this is not a perfect
method either. Simply
counting up the IP
addresses might actually
give you a lower (or
higher) estimate of
the people visiting
your site than in
reality. This is because
if you have people
who visit your site
who use dial-up modems
they are assigned
temporary IP addresses
each time they dial
in to the internet.
So if you have one
person who visits
your site on one day
and is assigned the
IP of 198.162.0.1
on one day and then
assigned 198.162.0.5
on the next day it
will be logged as
two different unique
visitors when in reality
it was only one. Also,
you might have one
person dial in and
use one IP to visit
your site and then
log off. And then
have another person
dial in and be assigned
the same IP and visit
you. This will show
up as one unique IP
because the IPs are
the same even though
there are two different
people visiting. Using
this method however
will give you a generalized
view of how many people
visited. |
 |
| For
example if you viewed
a quick snapshot statistics
for Navicosoft.com
you would find that
the following was
found: |
 |
| There
were 2,606 unique
visitors who visited
a total of 3,557 times
who requested 11,236
pages and recorded
62,905 hits. |
 |
| Which
number is better?
It really depends
on what you want to
know. We can see that
there were several
thousand different
IP logged (give or
take some duplication
or under estimating).
We can also see that
they requested a little
over 11,000 web pages
(were some browser
refreshes?). And because
of our site design
there were a little
of 62,000 individual
requests in the raw
log access file. So
how do we know which
pages these people
visited? And where
did they come from?
How about how long
did they stay on the
site? And more importantly
do I have to count
several thousand log
entries in the file
to determine all of
this? The answer to
all of these questions
lies within using
a graphical site statistics
program. With a click
of a button all this
information is at
your reach. In our
next article we will
discuss three of the
most popular programs
that will read your
raw log access file
and provide you with
information instantly.
But before you could
understand the information
coming from those
programs you had to
understand the terminology
displayed by these
programs. And that
is what we have accomplished
here. |
 |
| None
of these methods are
perfect, but without
knowing their flaws
(and their strengths)
you cannot effectively
tell what is happening
on your site. Each
is useful in filling
in one aspect of the
picture and all should
be taken with a grain
of salt as all of
these combined show
in general how your
site is doing not
exactly how it is
doing. |
 |
| Hosting
Articles Index |