MCS-274 Lab 2: Database design (Spring 2013)

Due April 16, 2013


In this lab, you and an assigned partner will continue working with the election data from lab 1. This time, your focus will be on developing a better design for the database than simply mirroring the file format in which the Secretary of State's office (SOS) distributed the data. The main portion of the lab uses the E-R design approach, rather than normalization. However, one of the extra-credit options does draw the connection to normalization.

We already discovered one quirk in the SOS data, which is that two different ID numbers are used for some of the parties. The scope of work for the main assignment does not include any resolution to such oddities. Instead, you are to represent the data we are given. In the specific case of party ID numbers, this means that each party has two attributes, the ID number and the alternate ID number, which for some parties happen to be equal. An extra-credit option addresses this topic.

Another feature of the SOS data you will discover, which is not of the same nature, is that two different forms of candidate ID number are used. For example, ALLEN QUIST is listed in the Candidates table as candidate 1040301, whereas in the US_House_Results table, he is listed as candidate 301. The missing link between these two is that he was running for office 104. That is, the long form of the candidate ID number is formed as the sum of 10000 times the office number plus the short form of the candidate ID number. Unlike the two party IDs, which show no obvious pattern and likely reflect an error, this situation with candidate IDs is quite systematic. You can store whichever version you consider appropriate and reconstruct the other using some arithmetic. If you probe deeper, you will discover that there is also a relationship between candidate IDs and party IDs; you can decide whether to take any advantage of that relationship.

A warning about dropping tables

In this lab, you will be creating tables by selecting data from other tables. If you do this erroneously, you may wind up with a table that is way too large. At that point, you will presumably drop the erroneous table and try again. The problem is, when you drop a table, Oracle ordinarily just moves it into your recycle bin so that you can get it back. If the table is huge, this means you will quickly exhaust your quota of space. One solution is to specifically purge your recycle bin. The other solution is to specify the purge option at the time you drop the table. That will cause it to be completely eliminated rather than being moved to the recycle bin. If you drop the table in a point-and-click fashion within SQL Developer, you can select the purge option by checking a checkbox. If you drop the table using the SQL command DROP TABLE, you would specify the PURGE option at the end of the command as in this example:


Minimum tasks

I strongly urge you to check your work with me at each stage, rather than proceeding ahead from what may be an ill-chosen starting point. However, your goal at each stage is to have a reasonable starting point for further design, not necessarily a perfect one. You may wind up discovering something in a later stage that causes you to go back and make minor revisions to an earlier stage. When you are all done, you will submit your work from all tasks, either in hardcopy form or by email.

  1. Identify the entity sets underlying the SOS tables. In many cases, each SOS table corresponds to a single entity set, but be careful -- there is at least one exception. For this task, just prepare a textual list of the entity sets.

  2. Identify the relationships amongst the entity sets. For each, give the name of the relationship and the names of the participating entity sets. If the meaning of the relationship or the role played by any of the entity sets is not clear, include a brief explanatory note.

    You should strive to provide only a minimal, non-redundant collection of relationships that suffices to capture all the information. Do not create a dense web of redundant relationships. To illustrate this point, consider a college course database. A relationship exists between students and courses, and another exists between professors and courses. There would be no reason to include a third relationship linking students and professors unless it served some unrelated purpose, such as the advisor relationship.

    There are acceptable database designs that do not include any subclasses, and I expect that most of you will choose such a design, for simplicity. However, if you do choose to include subclasses, then your list of relationships should explicitly indicate the "isa" relationships.

  3. For each entity set or relationship, identify its attributes. Each entity set should have at least one attribute, but relationships can have as few as zero attributes. In fact, in some of the reasonable designs, none of the relationships have attributes. Be careful when you decide which entity set or relationship each attribute is associated with. Some of the attributes in the SOS tables are not placed where they logically belong in an E-R diagram.

    Try to avoid redundant attributes. However, don't go so far as to eliminate the storage of recalculatable information, such as totals, counts, and percentages. (As an exception discussed earlier, you should only store one form of the candidate ID.) We already saw that the "total votes" is defined in an inobvious way for constitutional amendments. Some of the other numbers may be wrong in the SOS data (or at least inexplicable). For example, it may be that percentages are not properly rounded. As a computing professional, your responsibility would be to alert the competent authority of any anomalies you notice, but pending their correction, still arrange to store the data as given.

    For this task, you should prepare a textual list in the form of a two-level outline, with entity sets and relationships at the top level and attributes at the second level. That is, you should have a line for each entity set, each relationship, and each attribute, with each attribute name indented in under the corresponding entity set name or relationship name.

  4. Draw an E-R diagram, including such details as keys, referential integrity and degree constraints, and weak entity sets.

  5. Create a new collection of tables based on your E-R diagram, as explained in the textbook's Section 4.5. (If your design includes subclasses, you will also need to consult Section 4.6.)

    Although you could create the tables using the ordinary form of CREATE TABLE and then use INSERT to import the data from the SOS tables, Oracle provides a more convenient alternative, which combines table creation with data import. This more convenient alternative frees you from having to specify all the column names and types that are the same as in the source table.

    As an example, suppose you wanted to make your own identical copy of the sos.precincts table. You could use the following SQL command in Oracle:

    CREATE TABLE precincts AS
    (SELECT * FROM sos.precincts);

    In this example, Oracle copies the column names, types, and any NOT NULL constraints in order to create the table; then it copies all the rows of data as well. The SELECT subquery can be replaced with any other subquery, providing much more control. Here are some specific options that may be useful to you:

    1. If you only want to copy some of the columns, you can list explicit column names in place of *. In this case, you will likely want to include DISTINCT so as to avoid duplicated rows.

    2. If you want to rename some column, you can link the old and new names with the AS keyword.

    3. If you only want to copy some of the rows, you can use a WHERE clause.

    4. If you want to insert data that is a constant, or is computed using an expression, rather than coming from a source table, you can include the constant or other expression in the SELECT clause. In this case, you will surely want to use AS to provide a column name, and you may also need to provide an explicit data type if Oracle cannot infer a suitable type. In particular, if you use the constant NULL, you will want to specify a type. For example, to have a column of type CHAR(1) that is NULL, you could specify CAST(NULL AS CHAR(1)) followed by AS and a name for the column.

    5. If you want to include data from a column of the source table but convert it to a different type, you can use a CAST expression. For example, to have a column containing the district from the source table but converted to a VARCHAR(3), you could specify CAST(district AS VARCHAR(3)) followed by AS and a name for the column.

    Most of the data in your tables will come from the SOS tables. However, depending on your design, you may want some extra rows for such entities as county 88 (used for multi-county offices) and the special write-in, yes, and no candidates. If so, you can explicitly INSERT those extra rows. If you take this approach, you might consider including some feature in your design that makes it possible to distinguish the genuine counties and candidates from these special ones.

    You should submit the SQL commands you used to create the tables and insert data into them.

  6. Add a primary key to each table, using ALTER TABLE as shown at the bottom of page 326 of your textbook. You should submit these SQL commands.

Extra credit opportunities

You may independently choose any number of these, in any order.

  1. Determine whether each SOS table and each of your new tables is in BCNF.

  2. Verify that the SOS tables could be reconstructed from your tables (i.e., that your decomposition is lossless) by making new tables equivalent to the SOS ones, constructed from yours. Once again, you will find it convenient to use CREATE TABLE ... AS ....

    Then use MINUS queries to verify that the reconstructed tables contain no tuples not in the SOS tables, and vice versa. Recall that Oracle's MINUS is the same as what is called EXCEPT in the SQL standard and your textbook.

    You need not verify reconstruction of all of the SOS tables; the more you verify, the more extra credit you will receive.

  3. Add as many constraints to your tables as you reasonably can. You should already have added primary keys. Also, most of your columns will already have NOT NULL constraints imported from the SOS tables. The most important kind of constraint to add would be foreign keys. The syntax for adding these with ALTER TABLE is analogous to what you used for primary keys. Some referential integrity constraints may be difficult or impossible to incorporate into your design as foreign keys, due to the quirks of the SOS data. For these, you should state the constraint you would have liked to add. There might also be a few opportunities to add UNIQUE or CHECK constraints. If you find a reason to add any further NOT NULL constraint, the syntax is ALTER TABLE tableName MODIFY (columnName NOT NULL);

  4. Identify any broader ways in which the data ought to be consistent, which go beyond what you know how to enforce with SQL constraints. Prepare a list of consistency conditions.

  5. Prepare a list of data oddities that you would like to call to the SOS's attention. In some cases, if they were to resolve the oddity, you might be able to simplify the database design or enforce an additional constraint, which the current data doesn't obey. In other cases, the resolution might not have any impact on your design. Still, professional ethics demand that you point out any potential problem you notice.