Page Ranking

In order to assign the relevance of a word in a web page, a statistical analysis has been done on the words used in several languages.

This was performed with a modified web cruncher. The result were 7 files with the word frequency for English, Arabic, German, Spanish, Japanese, Portuguese and Russian languages.

For instance, these are some frequencies of English, Japanese and Spanish words took from their respective files,

 

English Russian Spanish
the,0.04422989
and,0.02470127
of,0.02441218
to,0.02143293
in,0.01644974
for,0.00971379
is,0.00851255
on,0.00713028
with,0.00563574
that,0.00561329
you,0.00518247
by,0.00499583
it,0.0047306
are,0.0046464
this,0.00463517
at,0.0046043
as,0.00456922
your,0.00379037
or,0.00373705
more,0.00373564
from,0.00366267
и,0.06485217
в,0.04606491
для,0.02559162
с,0.01698079
не,0.0139098
2014,0.01138074
что,0.00969471
від,0.00788824
00,0.00758716
дня,0.00698501
ru,0.00698501
от,0.00680436
из,0.00680436
В,0.0062022
то,0.0062022
все,0.00602156
к,0.00560005
это,0.00560005
о,0.00553983
или,0.00511832
14,0.00487746
de,0.03738276
la,0.03010256
di,0.02946818
un,0.01471143
en,0.01377498
il,0.013488
el,0.01344269
in,0.01215884
che,0.01169061
del,0.01159998
per,0.01158488
con,0.01079946
una,0.00924374
è,0.00699322
que,0.00654009
le,0.00623801
los,0.00567916
se,0.00552812
si,0.00551301
da,0.0054526
al,0.00498437

These numbers are the global frequencies of a word. Afterwards, a web cruncher will calculate the local frequencies of words in a particular web page.

The relevance of a word in a particular web page is the ratio of its local frequency to its global frequency.

 

                      local frequency
   word relevance =  -----------------
                      global frequency

 

If a word appear more frequently within a web page than it appears in all web pages then it will have ratio larger than 1 in that web page.

With button audit you could know the relevance of words in a web page according to this simple algorithm and is the one used by Trokam to rank pages.