Emily Brown

Database Report


Each semester, the students of LSC 508 have been annotating articles on library and information science topics and posting them on their web pages. Each time the class is offered, students must search for articles on the same topics from a variety of different sources, few of which are dedicated to library and information science resources. The professor who teaches LSC 508 has decided that each time the class is offered, the students' annotations should be stored in a database so that future students can search only relevant articles as they seek to better understand the course material.

The original form for these data is a simple APA style annotation in an HTML document. APA style requires these fields:

Last Name, First Initial. (Year of publication). Title of Article. Title of Publication, Volume Number (Issue Number), Page Number. Note about retrieval.

However, to allow students to search for articles using a variety of starting points, the total number of fields for each record in the electronic database is 23. See table describing fields.

Before calculating the size of the database, it should be noted that while TEXT is the most appropriate variable type for the annotations, it allows much more space than necessary. In fact, it is impossible for a 50-word annotation to exceed 1,000 characters. Thus the max size of each record, assuming that annotators follow instructions and keep annotations to 50 words or less, is 2,361 bytes. The max size of database after 5 years is 1,653,700 bytes (or 1.6537 gigabytes).

The software used to manage this database is MySQL database server 4.1 and MySQL Query Browser. MySQL is an open-source DBMS. It's fast and free, and its continued development doesn't depend on a large corporation whose goal is to make money, but on developers who are committed to creative problem solving. Exactly one year ago this month, Helwett Packard became the first major vendor to support MySQL, and while the DBMS has difficulty handling terabytescale databases, it can easily support gigabytescale databases like the one proposed here.

During the installation process, MySQL allows database administrators to indicate whether the computer they're using will be a dedicated server machine, server machine, or developer machine. If the administrator selects "dedicated server machine," then MySQL will be configured to use all available system space. Regardless of the type of machine, MySQL recommends a minimum of 200 megabytes of space on the hard drive to unpack, install, and create a database.

The computer on which the "articles" DBMS is installed is a Mobile Intel(R) Pentium(R) 4 CPU 2.80GHz Processor with 512 MB RAM, running Microsoft Windows XP Home Edition operating system version 5.1.2600. At the time when the DBMS was installed, there was 25.6 GB of free space (out of 34.46 GB total) on the computer. That is more than enough room for the database, even if it grows to its maximum size, and even if the DBMS is configured to use only some of the available system space.

MySQL also provides certain security options. The "articles" database will be password protected, and once the database goes online, the administrator may download another program, MySQL Administrator, which will allow her to give people varying levels of access. However, the database is currently searchable and editable through the query browser, which means that the only may to make it secure is to have a qualified person conduct the search for each user.

It should be noted that in many ways, the MySQL database server is perfectly designed for the "articles" database ÷ but the "articles" database may not be perfectly designed for MySQL. "Articles" is not a normalized database. All the information is in a single table, rather than broken up into related tables. This makes it easy to create and query "articles," but it means the database is filled with redundancies and certain functions of the Query Browser don't work.

The most notable example of this is that the Query Browser does not allow the administrator to double click on data in the table and edit it. Of course, the administrator can still type SQL statements into the query area of the browser or in the command line client (a DOS window for SQL programming). The administrator may also upload records from a text file with tabs between each field. Thus the best way to add records might be to set up an Excel file (figure 2) with column headings that match those in the main table of the database, then copy and paste the relevant cells into a text file (tabs are automatically inserted between columns) and upload the text file. Since the Query browser allows users to "bookmark" frequently used SQL statements, the "upload text" function could be entered once and saved for future use.

It is not possible to print from the query browser, but it is possible to export data to an HTML, XML, or Excel file (figure 3). Exporting to Excel is particularly helpful because of the variety of formatting options Excel provides (figure 4). The only disadvantage to this two-step printing process is that information about the search - such as what tables were searched and what parameters were used - is lost. The best way to compensate for this is to name the Excel file so that it includes at least some of this information.

There is one other way to print information, and that is to copy text from the command line client where searches are conducted and paste it into a text file. This is also a two-step process, the only advantage of which is the ability to display records vertically. The complete contents of the "articles" database have been printed this way (appendix 1). But for the following examples, a screenshot of the Query browser was taken to show searches in their original context.

The most obvious way to search the database is to search for information in a specific field, such as authors or titles. In addition to searching for a single name, users may search for a list of all authors, titles, or publications included in the database (figure 5). However, users are more likely to want to look up certain topics. The "articles" database provides two ways of searching for topics: users can search the two "topics" fields of each record using the exact wording on the syllabus, or they may conduct a simple keyword search that scans the titles and annotations of each record. (See figure 6 for a comparison of these two searches.)

Since users are going to take the results from the "articles" database and search for the actual articles somewhere else, they may want to display results in a meaningful order, such as alphabetically by author or source, or numerically by date (figure 7). Further, users may wish to narrow their searches by looking for only recent articles (figure 8) or by eliminating articles that are too short or too long for their purposes (figure 9). MySQL gives users all of these options, and it gives the administrator the tools to expand the database's capabilities in the future. MySQL has proved itself a viable competitor to Access and Oracle, and learning to use it will certainly have rewards beyond this particular project.

>>top