WWW Development and Activities at the University of Mannheim

Akitoshi YoshidaHeinz KredelHans-Werner Meuer
Computing Center, University of Mannheim, D-68131 Mannheim, GERMANY
{yoshida,kredel,meuer}@rz.uni-mannheim.de

Abstract

This paper describes World Wide Web (WWW) activities and experiences at the University of Mannheim in the past few years. The development of WWW and related services is summarized and several projects at the Computing Center (RUM) are described.

1. Introduction

The growth of the WWW [1] in the recent years has been tremendous. When it was first introduced to the public by CERN in 1992 as a tool to organize and distribute scientific documents, the majority of the public was not aware of its full potential. The WWW was primarily used at academic and research institutions to present their information to the outside world as well as to their internal members. After the introduction of Mosaic, the first browser with a graphic user-interface from NCSA, the number of information providers and those who access them started to increase rapidly. As many commercial WWW sites started to appear on the Internet and the price for communication hardware and associated cost dropped significantly, the WWW has become the central player in modern society. The types of services provided by the WWW have changed dramatically as more and more people started to use the WWW. The WWW has provided a unique environment where people can participate in two ways: providing information and gathering information. There are an increasing number of commercial services, selling a wide variety of items from pizza to real estates, as well as non-commercial services by government and public institutions, offering information to the public. Information provided in the WWW was traditionally static, but there are nowadays many dynamic documents generated by various techniques. Many of existing information systems such as databases have been integrated to the WWW, and a growing number of new applications using Java [2] are also appearing in the market.

Regarding the expansion of the WWW, a similar path was taken at the University of Mannheim, although the types of services provided might differ. The University of Mannheim is located in Mannheim, a city in south western Germany. The university offers undergraduate and graduate courses in business administration, economics, social science, literature, mathematics, and information science. There are approximately 12,000 students enrolled, 600 of them in the graduate programs. There are approximately 1,200 staff employees including professors, assistants, scientists, and technicians. The university is directly connected to the BelWü network, which is a high speed network in the state of Baden-Württemberg. The BelWü network has currently 34 Mbits/s capacity and is connected to the Win backbone, a German academic network with 2 Mbits/s capacity. The Win backbone is connected to other European backbones and to the U.S.

The Computing Center (RUM) offers various information services to the university. These services include network construction and administration, Internet services such as e-mail, News, and WWW, and educational services as giving courses and seminars. There are 30 staff members and 20 student assistants at RUM. There are 7 groups, each specializing in its own area such as LAN administration, Unix, Mail and News, PCs, and WWW. The group responsible for planning and administration of WWW activities has 3 staff employees and 7 student assistants. The group also offers WWW programming courses and seminars for students. In addition, it administers several servers for collaborating external non-profit organizations.

In the following sections, the details of these activities and experiences are described; Section 2 summarizes the historical development; Section 3 presents the current situation; Section 4 describes some of the projects at RUM; finally Section 5 concludes this paper and gives future perspectives.

2. Brief History

The first Web server in Mannheim was installed at RUM in the beginning of 1994. The university's official home page was created in cooperation with university officials. In addition, several pages introducing RUM were created to inform external users as well as internal users about the services and information provided by RUM. In the same year, several departments followed RUM and they started their own Web servers. For students, a server dedicated to serve students' home pages was installed at RUM. Every student was able to apply for a computer account for email and WWW. Soon after the server was in operation, a dozen of students set up their homepages and this became the beginning of one of the largest Web servers in Mannheim. By the end of 1994, there were 4 Web servers in the university. In the end of 1995, RUM, in cooperation with university officials, decided to create a CD-ROM with the entire Web pages in the university. The primary purpose of making this CD-ROM was to promote the newly established department "Technische Informatik (Computer Engineering)" to potential high school graduates. When this announcement was made public, there were 25 Web servers in the university with total data volume of about 300 MB. In the following few weeks, many people updated their pages and added new pages so that their pages could be included in the CD-ROM. By the time when the pages were finally gathered, there were about 540 MB of data. A master CD-ROM containing all these pages was created at RUM and 11,000 duplicates were pressed by a local company. This CD-ROM project is described later in this paper.

In 1995, there was a small CERN cache server at RUM. As the demand for WWW traffic increased, a larger and more efficient Harvest cache server was installed in 1996 to replace this CERN server. The size of this cache was initially about 1 GB, but in the end of that year it was increased to 4 GB to accommodate the demanding WWW traffic.

The network traffic to and from the university has increased dramatically in the last few years, as shown in Figure 1. The lower part of each bar represents traffic within the BelWü network, and the upper part representing traffic for the rest. The increase in traffic was largely caused by the wide spread use of the WWW at the university. The wide-spread WWW usage was attributed by the increase in both the number of computers attached to the university's local network and the amount and types of services offered by commercial and non-commercial information providers world wide.

Fig. ?
Fig. 1: Growth of Internet Traffic at the University of Mannheim

Although the population size of the university did not grow in the past few years, the number of people who own a user account at RUM has increased significantly, as shown in Figure 2. Also increased was the number of computers with IP addresses that are attached to the university's local network.

Fig. ?
Fig. 2: Increase in the number of user accounts at the Computing Center and that of IP addresses in the University

3. Current Services

3.1. Web Services

RUM currently administers the university's main Web server and several additional Web servers that include the students' server. On the main Web server, the university maintains its official information for its students, employees, and visitors. In addition to the official university information, RUM maintains Web pages introducing its various services and activities. There are also several departments and university organizations without their own servers maintaining their Web pages on this server.

There are currently about 51 WWW servers operated in the university. Figure 3 shows the distribution of software used by these servers. Apache [3] is most frequently used because of its high performance and availability on different platforms. The second most used software is the NCSA server. The Netscape server [4] and the Microsoft server [5] are bundled with Windows and therefore they are often used on that platform. The CERN server, which is relatively slow and inefficient compared to other servers, was one of the most often used servers a year and half ago. It is still used at those places whose administrators are not concerned of performance or have hesitation toward any change. Two Roxen [6] servers are running at RUM, one of them as the main university server. Mows [7,8] is a Java-based server developed at RUM. More information on Mows will be described later in this paper.

Fig. ?
Fig. 3: Distribution of Web-server software programs used in the Computing Center (RUM) and the University

The main Web server (http://www.uni-mannheim.de/) runs on a Sun SparcStation 20 and it receives in the average about 50,000 requests daily, delivering 300 MB of data to its clients, as shown in Figure 4-(i). A typical access pattern for a weekday is also shown in Figure 4-(ii). These figures show that the average hourly request-rate is around 2,500 requests/hour, but the peak rate almost reaches at 4,500 requests/hour. It is interesting to note that in the beginning of 1996, there were only about 3,000 requests per day. This means that the traffic to this server has increased by more than 15 times.

Fig. ?
Fig. 4-(i): Daily request-rates for the Web-server www.uni-mannheim.de [5/97]

Fig. ?
Fig. 4-(ii): Hourly request-rates for the Web-server www.uni-mannheim.de [5/97]

The distribution of client domains accessing this server, as shown in Figure 4-(iii), reveals that this server is mainly used by university members such as students, professors, and staff members. The 38% of these requests come from within the university. The domain with the largest number of requests is 'de', which accounts for 53% of all the requests. Among other domains with a large number of requests are 'com', 'net', 'uk', and 'edu'. Interestingly, the server is visible to German and U.S. institutions, but hardly visible to other European domains except for 'uk'.

Fig. ?
Fig. 4-(iii): Client domain distribution for the Web-server www.uni-mannheim.de [5/97]

As mentioned earlier, there are currently about 51 official Web servers in the university. This number has grown in double since the previous year. The number of pages and their data volume have increased much more dramatically, as summarized in Table 1. In this table, the Web servers are divided into four groups: RUM for servers operated at RUM; Dept. Group A for those operated by the business, economics, and social science departments; Dept. Group B for those operated by the information science and mathematics departments; finally Other Inst. for other university institutions. The numbers are given for January 1996 and May 1997.

RUM Dept. Group A Dept. Group B Other Inst. Total
1/965/971/965/97 1/965/971/965/97 1/965/97
Servers 820513 61266 2551
Files 7,669116,7999,47717,001 2,18218,03675021,509 20,078173,345
Volume [MB] 159.25,699.3260.6350.9 113.6633.25.952.2 539.36,735.6
Table 1: Number of Web servers and total data volume in the University

The apparent difference in these two years is the dramatic increase in both the number of files and data volume. There was also a change in the types of files made available. The following subsection summarizes MIME file type distributions of the files on these Web servers.

3.2. Web File Types

Fig. ?
Fig. 5-(i): Mime-type distribution of files on all the Web servers in the University [1/96]

Fig. ?
Fig. 5-(ii): Mime-type distribution of files on all the Web servers in the University [5/97]

Figure 5-(i) shows the MIME type distribution on the entire Web servers in 1996. In page count, half of the files were in HTML and one third as image files in GIF. In volume, one third was occupied by video sequences in various formats. Still images occupied one quarter. Additionally, there were several sound, postscript, and compressed files occupying volume. In 1997, there was a significant growth in both the number of files and data volume, as shown in Figure 5-(ii). There were variations in file types made accessible. The proportion of JPEG images against GIF images has increased slightly. The distribution of file types in volume shows that there were various new file types which did not show up in the previous year. The increase in gzip and tar came from the FTP server that was made accessible through a Web server. There were several proprietary document formats such as Word, PowerPoint, and CorelDraw. These formats were often used by many for writing papers and presentations.

In the following, a more detailed description of MIME file type distributions is presented.

3.2.1. Detailed Description

Main Server

Fig. ?
Fig. 6-(i): Mime-type distribution of files on the Web server www.uni-mannheim.de at RUM [1/96]

Fig. ?
Fig. 6-(ii): Mime-type distribution of files on the Web server www.uni-mannheim.de [5/97]

In Figures 6-(i) and (ii), the distributions on the main Web server are shown. In 1996, almost three quarters of the files were in HTML and the rest as image files, mostly in GIF. The data volume for the image files occupied half the volume. Other than HTML, GIF, and JPEG, there were only a few sound and compressed files. In 1997, the proportion of image files has increased due to the MATEO project. The number of files has doubled, while the data volume has increased by 10 times.

Student Server

Fig. ?
Fig. 7-(i): Mime-type distribution of files on the students' Web server RUM [1/96]

Fig. ?
Fig. 7-(ii): Mime-type distribution of files on the students' Web server at RUM [5/97]

Figure 7-(i) depicts the distribution on the students' server in 1996. The proportion of the number of image files to that of HTML files was higher on this server than on the main Web server, but that proportion in volume was slightly lower. It is assumed that many students probably used small inline images to make their homepages attractive. Files of other types were hardly used. Figure 7-(ii), depicting the distribution in 1997, clearly indicates that the Web fever has hit the university. The number of files has increased by 20 times and the data volume has increased by 10 times.

Other RUM Servers

Fig. ?
Fig. 8-(i): Mime-type distribution of files on the other Web servers at the Computing Center [1/96]

Fig. ?
Fig. 8-(ii): Mime-type distribution of files on the other Web servers at RUM [5/97]

In Figures 8-(i) and (ii), the rest of Web servers at RUM were treated together. There were many postscript files that occupied almost three quarters in volume in 1996. At that time, many technical documents and manuals were archived as postscript at one of the servers, which contributed to the large percentage of postscript presence. In 1997, previously dominant postscript files were replaced by several proprietary document formats. There was also a significant increase in both the number of files and data volume.

Fig. ?
Fig. 8-(iii): Mime-type distribution of files on the FTP Web server at RUM [5/97]

A new entry to the RUM Web server family in 1997 was the main FTP server. Due to its extremely large size and skewed file type distribution, it is shown separately in Figure 8-(iii). A large number of programs for various platforms were made available through a Web server. This Web server provides a user interface that supports downloading multiple files as a single archive file.

Dept. Group A Servers

Fig. ?
Fig. 9-(i): Mime-type distribution of files on the Web servers for business, economics, social science departments [1/96]

Fig. ?
Fig. 9-(ii): Mime-type distribution of files on the Web servers for the business, economics, social science departments [5/97]

Figure 9-(i) shows the distribution on the servers operated by the business, economics, and social science departments in 1996. There were many HTML files with inline GIF images. There were also several video sequences and compressed files. The increase among the servers at the business, economics, and social science departments in 1997 was modest, as shown in Figure 9-(ii). However, an increase in the use of various document formats was clearly seen among these servers.

Dept. Group B Servers

Fig. ?
Fig. 10-(i): Mime-type distribution of files on the Web servers for the information science, engineering departments [1/96]

Fig. ?
Fig. 10-(ii): Mime-type distribution of files on the Web servers for the information science, engineering departments [5/97]

The distribution shown in Figure 10-(i) was for the Web servers for the information science and mathematics departments in 1996. There were several video sequences and postscript files that dominated in volume. These video sequences were medical visualization films created by ray tracing on MRT data. The postscript files were technical papers published by members of the departments. In 1997, postscript files still remained dominant in volume, as shown in Figure 10-(ii). There were some quicktime video files in addition to MPEG video files. The increase in the number and volume was between 6 to 8 times.

Other Inst. Servers

Fig. ?
Fig. 11-(i): Mime-type distribution of files on the Web servers for other institutions [1/96]

Fig. ?
Fig. 11-(ii): Mime-type distribution of files on the Web servers for other institutions [5/97]

For the servers operated by the other institutions, most pages were exclusively HTML pages with inline GIF images, as shown in Figures 11-(i) and (ii). This group had the simplest patterns of distribution in 1997. There was a significant increase in the number of very small HTML files and there were a few quicktime movie files. These small HTML files are assumed to be generated from some database data.

3.3. Cache Services

Another service provided by RUM is caching of WWW data. RUM operates a Harvest (squid) [9] cache server to be used by members of the university for their HTTP, FTP and Gopher access. This cache server can cooperate with several other Harvest cache servers in Baden-Württemberg and the rest of Germany to provide faster access to remote resources. Currently, the cache server has 4 GB of storage and cooperates with four other cache servers in Germany, which are located in Stuttgart, Frankfurt, Bonn, and Mainz.

As shown in Figure 12-(i), the cache server receives 200,000 requests a day in the average, delivering 1,500 MB of data to its clients and cooperating cache servers. The traffic to the cache server becomes quite high during the peak hour around noon, reaching 5 requests per second, as shown in Figure 12-(ii). During this time, other cache servers in Europe are also fully loaded. The limited network resources to the U.S. must be effectively utilized by these servers to provide the maximal throughput to their clients. The cache statistics suggest that good time to surf the Internet is between 3 and 5 o'clock, and bad time is around noon. Extremely low access rates early in the morning indicate that there are not many Computer Science students in Mannheim.

Fig. ?
Fig. 12-(i): Daily request-rates for the Cache-server www-cache.uni-mannheim.de [5/97]

Fig ?
Fig. 12-(ii): Hourly request-rates for the Cache-server www-cache.uni-mannheim.de [5/97]

The most popular domain for the users of the cache server is 'com', as shown in Figure 12-(iii). Hosts providing Internet search such as Lycos, AltaVista are in this domain and they are very frequently accessed. Major browser companies such as Netscape and Microsoft are also accessed frequently partly because links to their pages are included on many pages. There are also quite a few number of adult entertainment sites that attract university members. The second most frequently accessed domain is 'de'. Amazingly, other domains such as 'edu', 'net', and 'org' are seldom accessed.

Fig. ?
Fig. 12-(iii): Host domain distribution of the Cache-server www-cache.uni-mannheim.de [5/97]

As shown in Figure 12-(iv), for clients, the average hit-rate on the local cache storage (primary TCP) is about 30%. For the cooperating cache servers, the hit-rate (primary ICP) is about 5%. Upon misses, requests are forwarded to the cooperating cache servers. The hit-rates from these servers (secondary) are about 10%. When the network connections between the cache servers are good, the effective hit-rate observed by local users becomes the sum of the fetch-rates multiplied by the local miss-rate plus the local hit-rate, which is calculated as 48%.

Fig. ?
Fig. 12-(iv): Hit-rates of the Cache-server www-cache.uni-mannheim.de [5/97]

3.4. Search Services

RUM also provides a university-wide Web indexing and search service using the Harvest software. This search engine can cooperate with other search engines in Heidelberg and Hohenheim to enable searching in these three universities. It is planned to expand this service to other universities in the BelWü network. At RUM, approximately 17,000 text files are currently gathered and indexed by 8 gatherers on a PC-based Linux cluster. Search queries can be processed by the broker program through a Web interface. The average number of requests to the broker is 90 requests per day. Most queries come from within the university or from other institutions in Germany, as shown in Figure 13. There are also some requests from domains 'com', 'net', followed by 'at', 'ch', and 'uk'.

Fig. ?
Fig. 13: Client domain distribution of the Search-server search.uni-mannheim.de [5/97]

4. Projects

Several projects have been carried out at RUM in the last few years. In this section, some of them are briefly described.

CONSULT-Web [10] added a Web interface using CGI [11] to a fulltext retrieval system CONSULT-Info, which is a product from SNI. CONSULT-Info uses a Unix server and Windows clients and has been used by SNI and its customers for storing documents of different kinds such as postscript, Word, CorelDraw, and PowerPoint. It supports both fulltext search and meta search in these documents and the matched documents can be retrieved as ASCII text or in the original format. CONSULT-Web provided a Web interface to this system so that anyone with a Web browser can use CONSULT-Info services. The flexibility and efficiency of this implementation was limited by CGI and HTTP. To overcome this problem, a Java version of CONSULT-Web utilizing the full potential of network programming and abstract windows interface is currently under development.

Followed by CONSULT-Web, there were several other projects using CGI to provide a Web interface to external information systems. Project-FRZ [12] provided a Web interface to a data-warehouse system used by distributors and suppliers of produces. The database resides in a main-frame computer BS2000 from SNI. Users can access this database from their browser via a Web server that communicates with the database. Similar CGI interfaces were developed for the database storing TOP500 [13] supercompuing data, and the network tool NATHAN [14].

Another project was CD-RUM [15] which produced a CD-ROM containing all the university's Web pages. The purpose of this CD-ROM was to provide people with no Internet access the entire information on the university's Web pages. In particular, the university was interested in introducing its new department 'Technische Informatik' to highschool students planning to go to university. The total of 11,000 CD-ROMs were fabricated and distributed to people around the country. To produce the master CD-ROM, a program was developed at RUM that followed hyperlinks and gathered pages, while translating the links in the downloaded pages accordingly so that they can be later accessed under various operating systems. Fortunately, there were less than 600 MB of data at that time. The CD-ROM was able to hold all the pages that were gathered. This situation does no longer hold, as mentioned earlier, that the volume of the entire Web services currently exceeds 6 GB. An intelligent mechanism is needed to trim the data volume to about 600 MB if another CD-ROM is to be produced.

MATEO [16] stands for Mannheimer TExt Online. Latest technical documents on various topics as well as valuable historical prints stored at the university library are made available electronically online. The currently offered collections include theses and technical papers as well as scanned pictures of several rare prints from the 15th centuries. MATEO allows these documents to be accessible to anyone with a browser. Some rare prints are made available to the public, which are otherwise viewable only to those highly qualified scholars. A CD-ROM containing some of these works has been produced and sold for the public as well.

Fig. ?
Fig. 14: An example from collection "edito theodoro-palatina"

MOWS [8] is a distributed Web and cache server written in Java and it is built from modules that can be loaded locally or remotely. These modules implement various features of Web and cache servers and enable MOWS to run as a cluster of distributed Web and cache servers. In addition to its distributed nature, MOWS can integrate external services using its own external interface. Java programs conforming to this interface can be loaded locally or remotely and executed at the server. The resulting system will potentially provide effective Web access by both utilizing commonly available computing resources and offering distributed server functionality. MOWS is freely available at the MOWS homepage http://mows.rz.uni-mannheim.de/mows/

Finally, RUM also supports external organizations for presenting their information on the WWW. The FOCUS and SAVE servers provide information on FOCUS (Forum for Computer and Communication Users of Siemens) and SAVE, which is a German chapter of FOCUS.

5. Conclusion and Future Perspectives

The number of user accounts at the university will be expected to stabilize soon, as it approaches the population size of the university. The number of computers in the local network will continue to grow, as PCs and workstations are continually purchased and being used for various applications at every department and institute of the university. Various portable computing devices such as notebook computers, personal data assistants, will also be attached to the network. At the same time, new media-rich applications are appearing in the Internet that require more bandwidth ever. As a result, the Internet traffic from and to the university will be expected to keep increasing for a while.

RUM must keep providing two types of services to the university community and the world. One is to provide and organize the Web infrastructure of the university that enables departments to effectively build and maintain their Web services. The other is to provide the related information infrastructure that enables university members to efficiently access resources around the world. Both tasks will be difficult, as the amount of information provided grows rapidly along with the growing demand for accessing and being accessed. For providing information, it becomes increasingly difficult to maintain the information to be updated, while avoiding dangling links to non-existing resources. In addition to these technical issues, there should be more effort in designing visually attractive pages. Graphic designers and user interface designers must create attractive pages with effective navigational structures so that users can easily find what they are looking for. Since the university's Web services are growing rapidly, many problems that are happening in the Internet will happen within the university's local network. Searching and organizing information becomes increasingly important within the university. As the number of people trying to access remote resources from the university grows, an efficient cache structure coordinated among other German institutions may be required. Some of these activities are under way.

University computing centers were traditionally engaged in specialized fields such as scientific computing, visualization, and database management, to assist their university's departments for conducting their research. As the Internet became a standard tool for everyone, there is a significantly higher demand on supporting the university community for their effective presentation and access of information over the Internet. RUM is devoted to provide such services to its community as well as to conduct research in the field for the benefit of the entire Internet community.

Acknowledgement

The authors thank Ralf-Peter Winkens for providing the network statistics.

References

  1. T. Berners-Lee and R. Cailliau, "WorldWideWeb: Proposal for a HyperText Project", CERN European Laboratory for Particle Physics, Geneva CH, November 1990, http://www.w3.org/hypertext/WWW/Proposal.html.
  2. JavaSoft Home Page, http://www.javasoft.com/
  3. Apache HTTP Server Project, http://www.apache.org/
  4. Netscape Communications Corporation, Netscape Server, http://home.netscape.com/comprod/server_central/
  5. Microsoft Corporation, Internet Information Server, http://www.microsoft.com/iis/.
  6. Welcome to Roxen, Informations Vävarna AB, http://www.roxen.com/
  7. A. Yoshida, "MOWS: Distributed Web and Cache Server in Java," Proc. 6th Int. WWW Conf., Santa Clara, CA, 1997 http://proceedings.www6conf.org/HyperNews/get/PAPER151.html
  8. MOWS Homepage, University of Mannheim, http://mows.rz.uni-mannheim.de/mows/
  9. A. Chankhuthod, P. B. Danzig, C. Neerdaels, M. F. Schwartz and K. J. Worrell, "A Hierarchical Internet Object Cache," University of Southern California, http://excalibur.usc.edu/cache-html/cache.html
  10. CONSULT-Web, University of Mannheim, http://rumix.uni-mannheim.de/CONSULT-Web/
  11. The Common Gateway Interface, University of Illinois, Urbana-Champaign, http://hoohoo.ncsa.uiuc.edu/cgi/overview.html
  12. WWW Data_Warehouse auf SESAM-Basis, University of Mannheim, http://parallel.rz.uni-mannheim.de/frz/dw.html
  13. TOP500, University of Mannheim, http://parallel.rz.uni-mannheim.de/top500.html
  14. MATEO, University of Mannheim, http://www.uni-mannheim.de/mateo/
  15. CD-RUM, University of Mannheim, http://suparum.rz.uni-mannheim.de:8080/start.htm
  16. NATHAN, University of Mannheim, http://memorum.rz.uni-mannheim.de/NATHAN/