HIGH TECH TUESDAY
Web Communities
Edited by B. Virtual
As an increasing percentage of human knowledge and communication goes
online, the potential for the analysis of interests and relationships
within science and society are great. Researchers have discovered that, despite its
decentralized, unorganized, and heterogeneous nature, the web
self-organizes such that communities of highly related pages can be
identified based solely on the link structure of the web.
"This discovery is significant because there is no central authority
or process governing the formation and structure of links on the web''
said Dr. Gary Flake of NEC Research Institute, the study's lead
author. Individual links on the web are created by millions of
different individuals, operating independently, and having different
backgrounds, knowledge, goals, and cultures.
While previous studies
have covered properties of the web graph such as the diameter (Nature,
401, p. 130) and the link distribution (Science, 286, p. 509), this
discovery is the first to bind the link structure and text content of
the web. An article detailing the discovery by Dr. Flake and
co-authors Dr. Steve Lawrence, Dr. C. Lee Giles, and Dr. Frans
Coetzee, will appear in IEEE Computer, Volume 35, Number 3, which will
be available on March 6, 2002. IEEE Computer is the flagship journal
of the IEEE Computer Society, the world's oldest and largest
professional society in computing.
The researchers define a web community as a collection of web
pages that have more links within the community than outside of the
community. This definition can be generalized to identify communities
with varying levels of cohesiveness. These communities are
self-organized in that the entire web graph determines membership.
The researchers show how the problem of identifying these communities
can be efficiently solved by recasting it into a maximum flow
framework, and present examples for the identification of communities
centered around well-known scientists (Francis Crick, Steven Hawking,
and Ronald Rivest).
Analysis of the content of the communities shows
that the member pages are highly relevant to the initial seed pages
and topically related in nontrivial ways. For example, in the Crick
community the scientists found references to Rosalind Franklin and
other early pioneers in genetics.
Practical applications of the discovery include the creation of
improved search engines, the automatic creation of web directories,
and content filtering. However, the discovery also opens up the
possibility of objective and rigorous analysis of the entire web.
IMPORTANT NOTE: In accordance with Title 17 U.S.C. section 107, this material is distributed without profit or payment to those who have expressed a prior interest in receiving this information for non-profit research and educational purposes only. This document may contain copyrighted material whose use has not been specifically authorized by the copyright owner. The Daily Revolution is making this article available in our efforts to advance understanding of various issues. We believe that this constitutes a `fair use' of the copyrighted material as provided for in section 107 of the US Copyright Law. If you wish to use this copyrighted material for purposes of your own that go beyond `fair use', you must obtain permission from the copyright owner.
|