The growth of the WWW has also been reflected in a growing literature that attempts to analyse the users, contents, and structure of ``the Web''. One of the more interesting aspects of this has been the use of the WWW itself to conduct surveys of its use. These studies include the work of the Georgia Tech Graphics, Visualization and Usability Center [Pitkow & Recker1994, Pitkow & Recker1995b, Pitkow & Recker1995a, Berghel1996], as well as smaller surveys with uncertain sampling methods[Rissa & Jarvinen Oy1995], and commercial surveys claiming rigorous methodology and analysis[CommerceNet Consortium1995]. Some studies have also focussed on individual WWW use via log analysis from browser logs [Catledge & Pitkow1995, Pitkow & Bharat1994], and on the characteristics and features of both users and their browsing tools [Berghel1996]. These surveys provide interesting statistics on the use and users of the internet, and in particular the WWW.
The demographics of internet users from these surveys indicate that they are predominantly male, fairly young (mean and median age was 35 years in [Pitkow & Recker1995a]), and tend to be university students, technical professionals or researchers. The more general internet survey conducted by Commercenet and Nielsen [CommerceNet Consortium1995] showed that among the frequent users of the internet WWW usage accounted for 75% of their usage. These users used it to ``browse and explore'' in 74% of the cases, to ``search for other information'' in 61% and to ``search for information on products/services'' in 50% of the cases [CommerceNet Consortium1995].
The explosive growth of the WWW has also been reflected in the sort of analyses done on it. In 1994, researchers at Carnegie Mellon used the web crawler developed for Lycos to collect infomation on a number of characteristics of WWW sites and their contents[Mauldin & Leavitt1994]. In that study Mauldin and Leavitt were able to maintain statistics on the content of titles and headings within WWW documents, frequency information on keywords, file size, numbers and types of Uniform Resource Locators (URLs), as well as other information on what other WWW were ``linked-to'' by each document. This last information permitted CMU researchers to construct a complete global map of the WWW as reflected in their database [Lycos, Inc.1996]. It is common now for WWW indexing services to provide lists of ``most-linked-to'' sites as part of their indexing. The commercial version of Lycos keeps a list of the 250 most frequently referenced WWW sites accessible to users[Lycos, Inc.1996].
More recently, the Inktomi Project at UC Berkeley has conducted an analysis of over 2.6 million WWW pages and provided descriptive statistics on the characteristic features of these pages[Aoki et al.1996]. This analysis required the development of special software for WWW page parsing and analysis that could cope with massive volume of data to be analysed (over 30 Gigabytes of HTML pages).