COM372: Notes, Chapter 8

Designing Your Database
pages 205-216

Syllabus | Grading | Reading Notes | Dr. Logan

Concepts

Components of a Database: A relational database is made of of a series of tables, each table consisting of data arranged in rows and columns. The individual tables look like spreadsheet tables; each has a name. The rows of a table represents a unique record. The columns, or fields, contain consistent types of information about the individuals in each record. One field must be unique; no two records can have the same value in that field, a way of guaranteeing that no two records are the same. This is the key, or primary key, field. Usually, this is an artificially derived unique number added into a special key field.

Databases usually contain several tables. Tables may be linked if they share a common field, usually the primary key of another table. By breaking information into linked tables, the total amount of information can be significantly reduced. For example, a table of customers might contain mailing and billing information. A second table might contain products. A third table might contain orders, with references to the unique identifiers for customers and other unique identifiers for products. If a customer had several orders, it would not be necessary to store mailing information for each order; associating many products in many orders to a single customer would make use of only one record of the customer's information, with a single link into each order in the orders table.

Schemas and Relationships: A list of all tables, fields, and primary keys constitutes a description of the database, or its schema. Links between tables are made with foreign keys, establishing a tie between records within the two tables. These relations can be described as one-to-one, one-to-many, or many-to-one. I might group information about one person into two tables, one on address and contact information and a second with medical information, linking the two in a one-to-one relation using the person's social security number. I might then relate that person to several orders. The orders table might consist of many products for each order number.

Database Design: There are formal theories for properly sorting data into tables, a process known as normalization. The goal is to reduce the replication of data and to assure unique records. This is as much a matter of common sense and "atomistic" thinking, focusing on the task of breaking information into logical groups and small units. Small units means that there is no nesting of information in any cell of the table. I was recently working on a table of member companies in an industry association; the table had a field (column) for "owner" which often included two or more people. If I wanted to search for what companies a given name was owner of, I would have had to parce the owner field to find the name, instead of looking for fields that matched the name. Instead, there should have been a separate table of owners with a field linking them to the table of companies.

The logic of database design is to make unique tables that group information that really belongs together while separating information that doesn't. The informal approach of the book suggests these simple rules:

We will look at some supplementary material that will improve your understanding of normalization, and even at a few cases where it is preferable to de-normalize (combine tables). Database design is more critical for larger databases (dozens or hundreds of interwoven tables), especially when the primary tool for interacting with the database in Structured Query Language (SQL). However, web applications tend to focus on relatively simpler databases, and the additional programming power of languages such as PHP often takes the pressure off the SQL coding.

Database Physical Architecture: Usually, the HTML web server and the PHP parser reside in the same machine. Sometimes, the database will also be in that same machine, as is the case in our class. For security purposes, however, the database may be in another folder, separate from the web files, or it may even be (as is rather common) on a different physical machine entirely. The process is not that complicated. PHP has connection functions that access the database. PHP's programming then sorts, branches, etc., using the data from the database, returning to the web server a stream of HTML code, which then gets served back to the client.

The Microsoft .NET framework in part is intended to deal with two issues relative to web databases, and you should watch for these. The first is the issue of remaining connected to the database while transactions are being conducted. A more contemporary approach is to minimize this by connecting, interacting with the database, and disconnecting, freeing the database server by having the database software (here, PHP) operate on a copy of the database (a record set). A second issue is architecture to facilitate connections to multiple databases, something considerably beyond the simple concept of figure 8.9.