PhpRiot
Become Zend Certified

Prepare for the ZCE exam using our quizzes (web or iPad/iPhone). More info...


When you're ready get 7.5% off your exam voucher using voucher CJQNOV23 at the Zend Store

Protecting Content And Handling Robots

Site Simulator

Our simulator will have a configuration file, a header, navigation bar, and the content page. Also there will be a validation page used as a trap for robots we do not want. We will create the 5 files in one directory, which will look like this:

sim/config.php
sim/index.php
sim/header.php
sim/navigation.php
sim/validate.php

In real site, config.php and header.php would be placed in document root out of the web root, and the real site structure will look something like

/config.php
/header.php
/navigation.php
/validate.php
/public_html/index.php

where public_html is the web root in RedHat Linux.

To focus on the subject, we will use simplified configuration. So, we will leave config.php and validate.php empty so far. Our HTML output will start in header.php which will be included in all files.

header.php

Listing 1 header.php
<html>
<head><title>Site simulator</title>
 
<style type="text/css">
<!--
    TD    {
        FONT-SIZE: 10pt; FONT-FAMILY: Verdana, Arial, Helvetica, sans-serif
        }
--></style>
 
</head>
 
<body leftmargin="50" topmargin="50" marginwidth="50" marginheight="50">

The actual simulation job will be done in navigation.php, which will contain a little script generating links to next and previous page and make the navigation bar.

navigation.php

Listing 2 navigation.php
<table border="0" cellspacing="1" cellpadding="3" width="100%">
    <tr>
        <td width="50%"><a href="index.php">Home</a>
        </td>
        <td width="50%">
<?php
$i=$page+1;
if($page>1){
    echo "<a href=\"index.php?page=".($page-1)."\">&lt;&lt;back</a>&nbsp;|";
    }
echo "<b><font color=\"#FF0000\">$page</font></b>|&nbsp;";
echo "<a href=\"index.php?page=$i\">next&gt;&gt;</a>";
?>
        </td>
    </tr>
</table>
<hr noshade size="1">

Now our index.php will only include config.php, header.php, navigation.php, and closing HTML tags.

index.php

Listing 3 index.php
<?php
/****** index.php ********/
include("config.php");
include("header.php");
include("navigation.php");
?>
</body>
</html>

At this point, with empty config.php and validate.php, header.php, navigation.php, and index.php with the above code, all 5 put in sim/ directory, we can open /sim/index.php and make sure that the navigation works. You will see 2 links: home and next. You can click next endlessly, and the page number increments.

We were going to restrict browsing too many pages, so we have to count hits. In our further work we will need session variables and some environment variables. Now open our config.php and add this code, which is explained in the comments:

Listing 4 config.php
<?php
//start session
 
session_start();
 
//register session variable. We will need hits counter
 
session_register("COUNTER");
 
//For this script I assume that register_globals is set to on
//as alternative, you may use super global $_SESSION
//then you MUST NOT use session_register()
 
//our counter will count different values so we should set array
//we will specify array variables when we need them
//so at this point we will just tell that counter is array
 
if(!isset($COUNTER)){
    $COUNTER=array();
    }
 
//if you use $_SESSION you can skip the above code
//and use $_SESSION instead of $COUNTER in the rest of the code
 
//after entering the code on validation page count is unset
if($submit){
    if($validcode==$COUNTER["code"]){
        unset($COUNTER["count"]);
        }
}
 
//limit specifies how many pages can be viewed before validation is called
//in our simulator, we will allow to view 20 pages
//then the visitor will have to enter validation code (or login/register)
 
$limit=20;
 
//here we will set array keys and assign values
//$count will be used to count hits until limit is reached
//$sessioncount will count total hits during session
 
$count=$COUNTER["count"]+1;
$sessioncount=$COUNTER["sessioncount"]+1;
 
$COUNTER["count"]=$count;
$COUNTER["sessioncount"]=$sessioncount;
 
//if count exceeds limit, we set the URL where to return after validation
//and validation code (unless you choose redirect to login/registration page)
//we will generate validation code from seconds and minutes
//using date() function
//you can use your own way to generate the validation code
//we use $validate variable to make something (not) happen
//depending on whether we are on validation page or elsewhere
 
if($count>$limit){
    $COUNTER["code"]=date("si");
 
        //if we are not on validation page redirect to validation page
        if(!$validate){
        $COUNTER["back"]=$_SERVER[REQUEST_URI];
        header("Location: validate.php");
        }
    }
 
?>

Now let’s check how it works. Click on the next button 20 times to see if you will be redirected to our empty validate.php

Open validate.php and paste the following code:

Listing 5 validate.php
<?php
$validate=1;
include("config.php");
include("header.php");
?>
<table border="0" cellspacing="1" cellpadding="3" width="100%">
    <tr>
      <td align="center">
 
      <h3>Enter the code to continue browsing</h3>
      </td>
    </tr>
    <tr>
      <td align="center">
        <form name="form1" method="post" action="<?=$COUNTER["back"]?>">
          <?echo "Code: <b><font color=#FF0000>".$COUNTER["code"]."</font></b>"?>
          <input type="text" name="validcode">
          <input type="submit" name="submit" value="OK">
        </form>
      </td>
    </tr>
  </table>
 
<hr noshade size="1">
</body>
</html>

Reload the page or close all browser windows and try again. After clicking next or back more than 20 times, the visitor is redirected and requested to enter the validation code.

It is time to fill our index.php with “content”. Your real site’s content is retrieved from database, but for our purpose we will use a short instruction on how to use the simulator and some environment and session variables for debugging purposes.

Now replace your index.php code with this:

Listing 6 index.php
<?php
include("config.php");
include("header.php");
include("navigation.php");
?>
<table border="0" cellspacing="1" cellpadding="3">
  <tr>
    <td><h2>
  <?if($page>0){?>
  Page <b><font color="#FF0000">
  <?=$page?>
  </font></b>
  <?}elseif($page==0){?>
  <b><font color="#FF0000">Homepage </font></b> of the multi-million-page site
  simulator
  <?}?>
</h2></td>
  </tr>
  <tr>
    <td>
      <p>Click next or back button to browse. When the count limit of <b>
        <?=$limit?>
        </b> pages is reached, you will be redirected to validation page and requested
        to enter the validation code. In real site you can direct a user to login/registration
        page after the limit is reached. To set the limit a user can browse within
        a session, change the <b>$limit</b> value in the <b>config.php</b> file.
      </p>
      <p>Some environment variables values, count limit and <b>$COUNTER</b> values are shown
         in the table below. Also we will use <b>print_r()</b> function
         in the bottom of the page to see the counter array values.</p>
    </td>
  </tr>
</table>
<hr noshade size="1">
<table border="0" cellspacing="1" cellpadding="3" align="center" bgcolor="#FFFFFF">
  <tr bgcolor="#EEEEEE">
    <td>Your session id is:</td>
    <td>
      <?=session_id()?>
    </td>
  </tr>
  <tr bgcolor="#EEEEEE">
    <td>Your user agent is:</td>
    <td>
      <?=$_SERVER[HTTP_USER_AGENT]?>
    </td>
  </tr>
  <tr bgcolor="#EEEEEE">
    <td>Your ip address is:</td>
    <td>
      <?=$_SERVER[REMOTE_ADDR]?>
    </td>
  </tr>
  <tr bgcolor="#EEEEEE">
    <td>This page URL is:</td>
    <td>
      <?=$_SERVER[HTTP_HOST].$_SERVER[REQUEST_URI]?>
    </td>
  </tr>
  <tr bgcolor="#EEEEEE">
    <td>Count limit: </td>
    <td><b>
      <?=$limit?>
      </b></td>
  </tr>
  <tr bgcolor="#EEEEEE">
    <td>Count value:</td>
    <td><font color="#FF0000"><b>
      <?echo $COUNTER["count"];?>
      </b></font> </td>
  </tr>
  <tr bgcolor="#EEEEEE">
    <td>Session count:</td>
    <td>
      <?=$sessioncount?>
    </td>
  </tr>
</table>
<hr noshade size="1"><?print_r($COUNTER)?>
</body>
</html>

See the working sample on http://www.alxg.net/sim/.

Download all the above code in zip package from http://www.alxg.net/sim/sim.zip.

Other Options

In This Article