Home‎ > ‎CIS 54‎ > ‎

Lab 6 - Restaurant Schema

In this lab you'll use real data to begin form a database by devsing a schema for the restaurant data. 

Introduction

Often data we would like to keep organized comes from poorly organized sources. This is the case for data we scrape from the web and wish to perform queries on or keep a history of. In this lab you'll inherit a data set in a non-normalized table and begin to make sense of it. The restaurant inspection data is fairly poor, for reasons you'll come to see. You task is to make a schema that fits it. You will have to make comprimises where the data isn't perfect. In this lab I expect you to document your assumptions. 

Practice Selects

Create an SQL query to answer the following questions. When you have a query that works, turn in the query, not the answer to the question. 
  1. How many cities are listed in the data? 
  2. How many restaurants are in each city? 
  3. How many restaurants in each city have a vermin (or VI) code? 
  4. Is it possible to have two of the same code violation? (i.e. have EH two or more times as a major or minor)

Develop a Restaurant Inspection Schema

The data you're working with comes from here:

The web page generates a list of all restaurant facilities in Santa Cruz county, when they were inspected, the coded results of their inspection. You can read the meanings of the codes here: 


The data as presented on the page is a bit weird because it's what California law requires. The table that holds the scraped data is defined this way: 

mysql> describe inspection ;
+--------------+--------------+------+-----+---------+-------+
| Field        | Type         | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| name         | varchar(128) | YES  |     | NULL    |       |
| address      | varchar(128) | YES  |     | NULL    |       |
| date         | date         | YES  |     | NULL    |       |
| crit_major   | varchar(64)  | YES  |     | NULL    |       |
| crit_minor   | varchar(64)  | YES  |     | NULL    |       |
| general      | int(11)      | YES  |     | NULL    |       |
| verified     | varchar(12)  | YES  |     | NULL    |       |
| past_major   | int(11)      | YES  |     | NULL    |       |
| past_minor   | int(11)      | YES  |     | NULL    |       |
| past_general | int(11)      | YES  |     | NULL    |       |
+--------------+--------------+------+-----+---------+-------+
10 rows in set (0.00 sec)

Candidate Keys

The data regards real world things. Entities. Think about what entities are in the data. Read the web page linked above for some background on what the data means. Think about what an inspection looks like and what the follow up entails. When you've decided what entities are represented in the data look for candidate keys. Remember, a candidate key is one or more attributes that identifies the entities you've selected. Your design document doesn't need to state what entities you consider a part of your schema, your table design essentially does that. But, you should be clear before you start looking for functional dependencies. 

Functional Dependencies 

What are the FDs in this data? Start with only the attributes you were given. If those have problems consider whether or not to use a surrogate key. If you do use a surrogate key, state that you have done so you will also need to tell me how you intend to match your surrogate keys when the webpage updates. For each functional dependency that you consider in your design you must do the following: 
  1. State the FD 
  2. Write a query that determines whether the FD holds in the data that's given
Beware that some FDs that you think should hold may not. That's because there's flaws in the way the data is presented. If you present an FD that you think should hold but doesn't you should describe what you think the flaw in the data is and justify why you will consider the FD true. Watch out for multivalued dependencies!

Normalization

Based on the FDs you derived in the last part, create a schema that is normalized. You don't need to create the tables yet. Your schema should be in the form: 

MyTable ( KeyAttr, ForeignKeyAttr, Data1, Data2, Data3 )

Your tables should be normalized into 4NF. Be sure to double check your assumptions! There are at least a few right answers, be prepared to justify yours.

Turn In

  1. The SELECT queries from the first part.
  2. A write-up describing your schema
    1. The FDs you find in the data
    2. The queries you used to check your FDs
    3. Justification where the FDs might have problems. 
    4. The definition of your tables

Grading

  • 5 points for your queries
  • 15 points for your schema design 
Comments