Method for studying keyword density in top ranking pages


Assume you are doing some search engine optimization (SEO) and that you would like to have an idea about the keyword density in competing websites. This article provides a step by step method for determining keyword density in the top ranking pages at a given website.


The method

The procedure described hereafter allows you to know the density of certain keywords in the best ranking pages for these keywords in a certain website.

As an example, lets study the density of the keywords 'wireless' and 'performance' in this website 'http://www.codealias.info'.

The procedure applied to our example consists in three phases :

  • 1. Obtain the list of the 10 top ranking pages in a 'codealias.info' for the keywords 'wireless' and 'performance'.
  • 2. Download these 10 pages.
  • 3. Calculate the occurrence/density of each keyword in these pages.

The procedure described below can be used for any keyword and any website. All what is needed is linux/unix environment that provides Perl and the sed, egrep and wget utilities.


Obtain the list of top ranking pages

For this purpose, we will use a Perl script that performs a search on the given keywords in the studied website.

The script uses the LWP::Simple module, if you dont have it then install it as follows :

$cpan
>cpan install LWP::Simple

This is the Perl scrip that we will use :

use  LWP::Simple;
 
$site="www.codealias.info";
$keywords="wireless+performance";
$req="http://search.yahoo.com/search?n=10&p=$keywords&vs=$site";
print $req;
 
getprint $req;

Place the script above in a file (e.g. get.pl), then run the following command :

perl ./post.pl | sed 's/href=\"/\n/g' | sed 's/\"/\n/g' | egrep "^http://www.codealias.info/\w+" > pagelist.txt

The pagelist.txt file will contain the following :

http://www.codealias.info/topics/performance
http://www.codealias.info/technotes/impact_of_wireless_handoff_delays_on_voip_qos
http://www.codealias.info/technotes/performance_evaluation_of_wireless_security_systems_part_1
http://www.codealias.info/technotes/802.11_handoff_performance_--_bibliography
http://www.codealias.info/technotes/performance_evaluation_of_wireless_security_systems_part_2_-_the_802.11_handoff_process
http://www.codealias.info/topics/security
http://www.codealias.info/technotes/performance_of_eap_and_radius_authentication_in_roaming_scenarios
http://www.codealias.info/technotes/performance_evaluation_of_wireless_security_systems_part_3_-_factors_affecting_handoff_performance
http://www.codealias.info/technotes/network_communication_properties_and_qos_of_voip

These are the top 10 ranking web pages for the keywords 'wireless' and 'performance' in the website codealias.info.


Download the top ranking pages.

Now that we have the list of pages, we can download them using wget as follows :

wget -i filelist.txt

Calculate the occurrence/density of each keyword in the top ranking pages

This is a pretty straight forward procedure. You just need to run the following script :

for key in performance wireless
do
echo Stats for $key :
total=0
 
while read url
do
echo -n "\t$url: "
count=$(cat $(basename $url) | grep -c $key)
echo $count
total=`expr $total + $count`
done <  filelist.txt
 
echo "\t===> Total : $total\n"
done

The result is as follows :

Stats for performance :
	http://www.codealias.info/topics/performance: 37
	http://www.codealias.info/technotes/impact_of_wireless_handoff_delays_on_voip_qos: 6
	http://www.codealias.info/technotes/performance_evaluation_of_wireless_security_systems_part_1: 19
	http://www.codealias.info/technotes/802.11_handoff_performance_--_bibliography: 16
	http://www.codealias.info/technotes/performance_evaluation_of_wireless_security_systems_part_2_-_the_802.11_handoff_process: 15
	http://www.codealias.info/topics/security: 18
	http://www.codealias.info/technotes/performance_of_eap_and_radius_authentication_in_roaming_scenarios: 18
	http://www.codealias.info/technotes/performance_evaluation_of_wireless_security_systems_part_3_-_factors_affecting_handoff_performance: 15
	http://www.codealias.info/technotes/network_communication_properties_and_qos_of_voip: 3
	===> Total : 147

Stats for wireless :
	http://www.codealias.info/topics/performance: 25
	http://www.codealias.info/technotes/impact_of_wireless_handoff_delays_on_voip_qos: 16
	http://www.codealias.info/technotes/performance_evaluation_of_wireless_security_systems_part_1: 31
	http://www.codealias.info/technotes/802.11_handoff_performance_--_bibliography: 5
	http://www.codealias.info/technotes/performance_evaluation_of_wireless_security_systems_part_2_-_the_802.11_handoff_process: 21
	http://www.codealias.info/topics/security: 49
	http://www.codealias.info/technotes/performance_of_eap_and_radius_authentication_in_roaming_scenarios: 7
	http://www.codealias.info/technotes/performance_evaluation_of_wireless_security_systems_part_3_-_factors_affecting_handoff_performance: 24
	http://www.codealias.info/technotes/network_communication_properties_and_qos_of_voip: 7
	===> Total : 185

So the conclusion here is that codealias.info uses the word 'performance' 147 times in its top performing pages for 'wireless performance'. The page that shows the most density is 'http://www.codealias.info/topics/performance'.

The word 'wireless' was used 185 times in the top performing pages for 'wireless performance'. The page that shows the most density is 'http://www.codealias.info/topics/security'.



Labels: , Wireless Internet Security Coding Network Monitoring

Comment

Enter your comment (wiki syntax is allowed):
MTNWJ

Wireless Internet Security Performance RADIUS server Wireless Internet Security Performance RADIUS server