MCS-274 Lab 1: Basic SQL (Spring 2015)

Due March 10, 2015

Background

In this lab, you will use the MySQL database to try out basic SQL queries on tables I have downloaded from the Minnesota Secretary of State's election results from the November 4, 2014, general election. (In case that web site changes, I've also got local copies of the files I downloaded. The data tables came from candidates.txt, counties.txt, governor_results.txt, municipalities.txt, parties.txt, precincts.txt, state_house_results.txt, us_house_results.txt, and us_senate_results.txt. The format is described in file_format.txt.) I closely modeled my database tables on the files.

All the work for this lab can be most easily done on one of the MCS lab Macs running OS X, since we have the MySQL Workbench client software installed on those machines. However, you could also download and install this software on your own machine and still connect to our server, thebe.virtual.gac.edu. I don't promise to provide support for the technical difficulties you may encounter on your own machine. Also, if you want to connect to our server from off campus, you need to use the MySQL Workbench's feature of connecting over SSH.

You may do this lab work either with a partner or alone.

I'll demonstrate running MySQL Workbench in class. You should create a connection, specifying your username, the initial database password I give you, and the hostname thebe.virtual.gac.edu.

Because the password I issue you isn't very secure, you should probably start by changing your password, which you can do with a MySQL command like the following:

set password = password('myNewSecretPassword');

To find the tables you will be using in MySQL Workbench, in the left-hand pane under Schemas double click sos and under that Tables. The tables are named candidates, counties, governor_results, municipalities, parties, precincts, state_house_results, us_house_results, and us_senate_results. If you double clicked the sos schema, that schema should be set as the default for your SQL queries so that you can directly refer to these table names. Alternatively, you can always refer to them with names like sos.candidates, which tell MySQL to look for the table in the sos schema. (When we discuss granting of privileges, you will see how I granted you all permission to select data from these tables but not modify them.) I am attaching the CREATE TABLE statements for your reference, as well as the LOAD DATA statements I used to load the data into the tables. As long as you are using our server, you won't need to execute these statements yourself.

Expectations

For this lab, you need not write a true lab report. Instead, just turn in your SQL queries and the results they produced. For each one, give my English statement of the problem, your SQL query, and the resulting table, in that order.

You can check if you get the same results as I do. However, be warned that it is possible to get the right answers even with incorrect queries..

If in any case you have reason to doubt that your query is correct but haven't been able to track down what is wrong or fix it, you should indicate what your concern is.

Be sure to answer each question using a single SQL query that directly produces the desired result. It is not acceptable to issue one query, manually extract some information from the result, and use that to type in a second query, which produces the desired result.

The desired result should also be immediately apparent; what follows are two examples of how you could fail to meet this expectation. Example 1: if I ask you how many of something exists, your output should be a single row showing the number, not a bunch of rows, where the number of rows is the answer. Example 2: if I ask you what is the largest value of something, your output should just be that largest value, not an ordered list from which the largest value can be read off.

Your queries should not produce results with duplicate rows.

Try not to make your queries unnecessarily complex. In particular, don't mention a table if it isn't needed to find the result.

Specific problems

  1. Find the names of candidates who ran for an office ending in '19A'.

  2. Provide an alphabetical list of the parties' full names.

  3. Find the candidates who ran for the office of 'U.S. Senator'. For each, list the candidate's and party's full names. Hint: join tables using the party abbreviation, not the party ID. Later you will see why.

  4. Find a list of candidates' names for those candidates for US office who received a majority of the votes in at least one precinct in 'Nicollet' county.

  5. In which precinct (or precincts) did 'CLARK JOHNSON' receive his largest state House vote count? Give your answer as a precinct name (or names, if there is a tie).

  6. In which counties did 'CLARK JOHNSON' have state House election results? Give your result as an alphabetic list of county names.

  7. In how many precincts did 'TIM WALZ' have US House results?

  8. In how many precincts did 'TIM WALZ' win a majority of the US House votes?

  9. Find any rows of the candidates table that contain a party ID number not present in the table of parties. For each, show the candidate's name and party ID number.

  10. Find any candidates with a party ID that does not match the one shown in the parties table for the candidate's party abbreviation. For each, show the candidate's name, candidate ID, party ID number, party abbreviation, and the party ID number that is listed in the parties table as corresponding to the abbreviation.

  11. Sometimes it happens that two distinct municipalities (identified by different MCD codes) happen to have the same name. List the cases where this occurs, for each providing the municipality name, the two MCD codes, and the two county names. List each pair only once, not also in reversed order. The list should be in alphabetical order by municipality name.

  12. How many counties are there for which the number of precincts shown in the counties table doesn't match how many are in the precincts table?

  13. Produce a table of state-wide votes for the US Senate race, showing in each row the candidate's name and the corresponding number of votes. The table should be in descending order of votes.

  14. What percentage of the votes for Governor in each county were for the DFL candidate? List the county names with their percentages (which should be labeled as such). Give the list in ascending order of percentage. Hint: The percentage column in the governor_results table is not helpful for this.

  15. Although each MCD code in the municipalities table identifies a unique municipality, the same code can appear in more than one row if a municipality spans multiple counties. Find all the cases where this appears, listing for each the MCD code, municipality name and number of counties. The list should be organized primarily in decreasing order of number of counties and for any given number of counties in alphabetical order by municipality name.