Index Annotations Database Information Theory

Adam's Database Project

The Situation

The students of LSC 508 read a diverse collection of articles and write many annotations over the course of the semester. The results of their labors, an immense quantity of work, is currently stored in separate files, distributed amongst many directories and on a handful of web servers. If there were some way to integrate all of this information into one centralized repository, it could be analyzed and searched. Such availability would prove enormously beneficial not only to current students who wish to compare their work with that of their peers, but to researchers who seek out articles to improve their understanding of specific topics.

Technical Requirements

The LSC 508 database is only as useful as it is accessible. If researchers cannot use the database, they will see little purpose in its construction. Similarly, if students are unable to enter their work, they will have little use for the system. Given these facts, it is clear that a primary goal of this database project is to maximize accessibility. For this reason, databases designed for the desktop were quickly rejected.

A web-based system would make it easy for people around the world to use the database. This would allow a large number of students and researchers the opportunity to use the database from distant work areas at the same time.

A major area of concern in the software selection process was price. Educational institutions and libraries often find themselves with budgets barely able to cover existing needs. For this reason, the database system should run on very modest hardware and be available at no cost.

Ultimately, MySQL was chosen. Not only is it free, but it is able to run on very old computers. My 350 megahertz Pentium II computer with 96 megabytes of RAM was able to handle the demands of the completed product. Although the official MySQL website does not mention any hard system requirements, my experience has been that any Pentium computer with 64 megabytes of RAM is sufficient for a moderately popular database. MySQL runs on many platforms, so a similarly powerful Apple computer or UNIX workstation would also be able to run the same system.

MySQL only provides a data storage and searching system. Additional software is required to create a graphical user interface. PHP was chosen for this task because it was relatively simple, free and can run on even the most ancient computer systems. It is available for Microsoft Windows, UNIX and Apple.

To serve the database over the internet, a web server is also required. Any modern server should work with my database, though it has only been tested with the free web server known as Apache. It should be noted that to connect to the internet, the computer running the database should have either a modem or an ethernet card.

Security

All of the data being collected and entered is intended to be shared with the widest possible audience. Because of this, my database allows any visitor to search through the stored annotations. Security restraints are only apparent when one tries to add, delete or edit information. Any registered user can add a record, but only the person who created a record can edit it.

Because a login and password are required to change fields in the database, and that any login information will only allow the malicious alteration of a single user's records, the system is fairly secure against unauthorized access. MySQL allows registered users to make backups of all tables with a tool called mysqldump, so the system administrator always has the ability to roll the database back to a prior state, should the need arise.

Table Layout

The system will need to store two tables of information:

  1. Username and password information for each user
  2. Annotations
The layout of these tables, their disk space requirements and rationales for field selection are available here.

Storage Space

There are currently twenty-five students per LSC 508 class. There are approximately three LSC 508 classes per year. Over five years, there will be a total of roughly 375 students logging in and storing information.

As each login record has a maximum size of 20 bytes, the database can be expected to store 375*20=7,500 bytes of login information in five years.

If each student were to enter fifteen annotations, there would be 375 * 15 = 5,625 annotations to store. The absolute largest size per annotation record is 65,860 bytes. Given that the majority of fields only use as many bytes as are necessary to store user data, actual record size will likely be far lower.

65,860 bytes * 5,625 annotations = 370,462,500 bytes for the maximum file size. Given that the smallest hard drives available a decade ago could store roughly ten times that information, and that realistically, the data required would be far less under normal operation, the size would appear to be not overly burdensome on the hosting server.

Final Thoughts

PHP and MySQL are a very powerful tools for creating databases for use on the world wide web. The learning curve is extremely high, but what they lack in ease of use, they make up for in raw power. Content management systems, calendars and even games have been written with PHP and MySQL.

The finished annotation database can be found at http://thatgeek.com/lsc508.




Images used with permission, © 1999-2004 www.barrysclipart.com.
All other content © 2004 Adam Juda