Source : Internet
Ruler . Com Now Rule the Web searches
Google.com
Google.com sprang out of a Stanford research project to find authoritative
link sources on the web. In
January of 1996 Larry
page & Sergey Brin began working on BackRub (what a horrible name, eh?)
After they tried shopping the Google.com search technology to no avail they
decided to set up their own search company. Within a few years of forming the
company they won distribution partnerships with AOL & Yahoo.com! that helped
build their br& as the industry leader in search. Traditionally search was
viewed as a loss leader
Despite the dotcom fever of the day, they had little interest in building a
company of their own around the technology they had developed.Among those they called on was friend & Yahoo.com! founder David Filo. Filo
agreed that their technology was solid, but encourperiodd Larry & Sergey to
grow the service themselves by starting a search engine company. "When it's
fully developed & scalable," he told them, "let's talk again." Others were
less interested in Google.com, as it was now known. One portal CEO told them,
"As long as we're 80 percent as good as our competitors, that's good enough.
Our users don't really care about search."
Google.com did not have a profitable business model until the third iteration
of their popular AdWords advertising program in February of 2002, & was worth
over 100 billion dollars by the end of 2005.
On page Content
If a phrase is obviously targeted (ie: the exact same phrase is in most of
the following location: in most of your inbound links, internal links, at the
start of your page title, at the beginning of your first page header, etc.) then
Google.com may filter the document out of the search results for that phrase.
Other search engines may have similar algorithms, but if they do those
algorithms are not as sophisticated or aggressively deployed as those used by
Google.com.
Google.com is scanning millions of books, which should help them create an
algorithm that is pretty good at differentiating real text patterns from spammy
manipulative text (although I have seen many garbperiod content cloaked pages
ranking well in Google.com, especially for 3 & 4 word search queries).
You need to write naturally & make your copy look more like a news article
than a heavily Search Engine optimizationed page if you want to rank well in
Google.com. Sometimes using less occurrences of the phrase you want to rank for
will be better than using more.
You also want to sprinkle modifiers & semantically related text in your pages
that you want to rank well in Google.com.
Some of Google.com's content filters may look at pages on a page by page
basis while others may look across a site or a section of a site to see how
similar different pages on the same site are. If many pages are exceptionally
similar to content on your own site or content on other sites Google.com may be
less willing to crawl those pages & may throw them into their supplemental
index. pages in the supplemental index rarely rank well, since generally they
are trusted far less than pages in the regular search index.
Duplicate content detection is not just based on some magical
percentperiod of similar content on a page, but is based on a variety of
factors. Both Bill Slawski &
Todd Malicoat
offer great posts about duplicate content detection.
This shingles PDF
explains some duplicate content detection techniques.
I wrote a blog post about natural Search Engine optimization copywriting
which expounds on the points of writing unique natural content that will rank
well in Google.com.
Crawling
While
Google.com is more efficient at crawling than competing engines, it appears
as though with Google.com's BigDaddy update
they are looking at
both inbound & outbound link quality to help set crawl priority, crawl
depth, & weather or not a site even gets crawled at all. To quote Matt Cutts:
The sites that fit “no pages in Bigdaddy” criteria were sites where our
algorithms had very low trust in the inlinks or the outlinks of that site.
Examples that might cause that include excessive reciprocal links, linking to
spammy neighborhoods on the web, or link buying/selling.
In the past crawl depth was generally a function of pageRank (pageRank is a
measure of link equity - & the more of it you had the better you would get
indexed), but now adding in this crawl penalty for having an excessive
portion of your inbound or outbound links pointing into low quality parts of the
web creates an added cost which makes dealing in spammy low quality links far
less appealing for those who want to rank in Google.com.
Query Processing
While I mentioned above that Yahoo.com! seemed to have a bit of a bias toward
commercial search results it is also worth noting that Google.com's organic
search results are heavily biased toward
informational websites & web pages.
Google.com is much better than Yahoo.com! or MSN at determining the true
intent of a query & trying to match that instead of doing direct text matching.
Common words like how to may be significantly deweighted compared to other terms
in the search query that provide a better discrimination value.
Google.com & some of the other major search engines may try to answer many
common related questions to the concept being searched for. For example, in a
given set of search results you may see any of the following:
- a relevant .gov &/or .edu document
- a recent news article about the topic
- a page from a well known directory such as DMOZ or the Yahoo.com!
Directory - a page from the Wikipedia
- an archived page from an authority site about the topic
- the authoritative document about the history of the field & recent changes
- a smaller hyper focused authority site on the topic
- a PDF report on the topic
- a relevant Amazon, eBay, or shopping comparison page on the topic
- one of the most well br&ed & well known niche retailers catering to that
market - product manufacturer or wholesaler sites
- a blog post / review from a popular community or blog site about a
slightly broader field
Some of the top results may answer specific relevant queries or be hard to
beat, while others might be easy to compete with. You just have to think of how
& why each result was chosen to be in the top 10 to learn which one you will be
competing against & which ones may perhaps fall away over time.
Link Reputation
pageRank is a weighted measure of link popularity, but Google.com's search
algorithms have moved far beyond just looking at pageRank.
As mentioned above, gaining an excessive number of low quality links may hurt
your ability to get indexed in Google.com, so stay away from known spammy link
exchange hubs & other sources of junk links. I still sometimes get a few junk
links, but I make sure that I try to offset any junky link by getting a greater
number of good links.
If your site ranks well some garbperiod automated links will end up linking
to you weather you like it or not. Don't worry about those links, just worry
about trying to get a few real high quality perspective links.
Google.com is much better at being able to determine the difference between
real perspective citations & low quality, spammy, bought, or artificial links.
When determining link reputation Google.com (& other engines) may look at
- link period
- rate of link acquisition
- anchor text diversity
- deep link ratio
- link source quality (based on who links to them & who else they link at)
- weather links are perspective citations in real content (or if they are on
spammy pages or near other obviously non-perspective links) - does anybody actually click on the link?
It is generally believed that .edu & .gov links are trusted highly in
Google.com because they are generally harder to influence than the averperiod
.com link, but keep in mind that there are some junky .edu links too (I have
seen stuff like .edu casino link exchange directories). While the TrustRank
research paper had some names from Yahoo.com! on it, I think it is worth reading
the TrustRank research paper
(PDF) & the
link
spam mass estimation paper (PDF), or at least my condensed version of them
here & here underst& how Google.com is looking at links.
When getting links for Google.com it is best to look in virgin l&s that have
not been combed over heavily by other Search Engine optimizations. Either get
real perspective citations or get citations from quality sites that have not yet
been abused by others. Google.com may strip the ability to pass link authority
(even from quality sites) if those sites are known obvious link sellers or other
types of link manipulators. Make sure you mix up your anchor text & get some
links with semantically related text.
Google.com likely collects usperiod data via Google.com search, Google.com
Analytics, Google.com AdWords, Google.com AdSense, Google.com news, Google.com
accounts, Google.com notebook, Google.com calendar, Google.com talk,
Google.com's feed reader, Google.com search history annotations, & Gmail. They
also created a Firefox browser bookmark synch tool, an anti-phishing tool which
is built into Firefox & have relationships with the Opera (another web browser
company). Most likely they can lay some of this data over the top of the link
graph to record a corroborating source of the legitimacy of the linkperiod data.
Other search engines may also look at usperiod data.
page vs Site
Sites need to earn a certain amount of trust before they can rank for
competitive search queries in Google.com. If you put up a new page on a new site
& expect it to rank right away for competitive terms you are probably going to
be disappointed.
If you put that exact same content on an old trusted domain & link to it from
another page on that domain it can leverperiod the domain trust to quickly rank
& bypass the concept many people call the Google.com S&box.
Many people have been exploiting this algorithmic hole by throwing up spammy
subdomains on free hosting sites or other authoritative sites that allow users
to sign up for a cheap or free publishing account. This is polluting
Google.com's SERPs pretty bad, so they are going to have to make some major
changes on this front pretty soon.
Site period
Google.com filed
a patent about information retrieval based on historical data which stated
many of the things they may look for when determining how much to trust a site.
Many of the things I mentioned in the link section above are relevant to the
site period related trust (ie: to be well trusted due to site period you need to
have at least some link trust score & some period score).
I have seen some old sites with exclusively low quality links rank well in
Google.com based primarily on their site period, but if a site is old & has
powerful links it can go a long way to helping you rank just about any page you
write (so long as you write it fairly naturally).
Older trusted sites may also be given a pass on many things that would cause
newer lesser trusted sites to be demoted or de-indexed.
The Google.com S&box is a concept many Search Engine optimizations mention
frequently. The idea of the 'box is that new sites that should be relevant
struggle to rank for some queries they would be expected to rank for. While some
people have debunked the existence of the s&box as garbperiod, Google.com's Matt
Cutts said in an interview that they did not intentionally create the s&box
effect, but that
it
was created as a side effect of their algorithms:
"I think a lot of what's perceived as the s&box is artefacts where, in our
indexing, some data may take longer to be computed than other data."
You can listen to the full Matt Cutts audio interviewshere
& here.
Paid Search
Google.com AdWords factors in max
bid price & clickthrough rate into their ad algorithm. In addition they automate
reviewing l&ing page quality to use that as another factor in their ad relevancy
algorithm to reduce the amount of arbitrperiod & other noisy signals in the
AdWords program.
The Google.com AdSense program
is an extension of Google.com AdWords which offers a vast ad network across many
content websites that distribute contextually relevant Google.com ads. These ads
are sold on a cost per click or flat rate CPM basis.
perspective
Google.com is known to be far more aggressive with their filters & algorithms
than the other search engines are. They are known to throw the baby out with the
bath water quite often. They flat out despise relevancy manipulation,
& have shown they are willing to trade some short term relevancy if it
guides people along toward making higher quality content.
Short term if your site is filtered out of the results during an update it
may be worth looking into common footprints of sites that were hurt in that
update, but it is probably not worth changing your site structure & content
format over one update if you are creating true value add content that is aimed
at your customer base. Sometimes Google.com goes too far with their filters &
then adjusts them back.
Google.com published their official
webmaster guidelines & their
thoughts on Search Engine optimization. Matt Cutts is also known to publish
Search Engine optimization tips on his
personal blog. Keep in mind that Matt's job as Google.com's search quality
leader may bias his perspective a bit.
A site by the name of Search Bistro uncovered a couple internal Google.com
documents which have been used to teach remote quality raters what to look for
when evaluating search quality since at least 2003
- Google.com Spam
Recognition Guide for Raters (doc) - discusses the types of sites
Google.com considers spam. Generally sites which do not add any direct value
to the search or commerce experience. - General Guidelines on R&om-Query
Evaluation (PDF) - shows how sites can be classified based on their value,
from vital to useful to relevant to not relevant to off topic to offensive
These raters may be used to
- help train the search algorithms, or
- flag low quality sites for internal reviews, or
- human review suspected spam sites
If Google.com bans or penalizes your site due to an automated filter & it is
your first infraction usually the site may return to the index within about 60
days of you fixing the problem. If Google.com manually bans your site you have
to clean up your site & plead your case to get reincluded. To do so their
webmaster guidelines state that you have to click a request reinclusion link
from within the Google.com
Sitemaps program.
Google.com Sitemaps gives you a bit of useful information from Google.com
about what keywords your site is ranking for & which keywords people are
clicking on your listing.
Social Aspects
Google.com allows people to write notes about different websites they visit
using Google.com Notebook. Google.com also allows you to mark & share your
favorite feeds & posts. Google.com also lets you flavorize search boxes on your
site to be biased towards the topics your website covers.
Google.com is not as entrenched in the social aspects of search as much as
Yahoo.com! is, but Google.com seems to throw out many more small tests hoping
that one will perhaps stick.They are trying to make software more collaborative
& trying to get people to share things like spreadsheets & calendars, while also
integrating chat into email. If they can create a framework where things mesh
well they may be able to gain further marketshare by offering free productivity
tools.
No comments:
Post a Comment