Keys in Real Data: Cabrillo Courses

In this lab we will examine some trade-offs when addressing problems with data. This lab builds on the schema that we imported in the Using Real Data: Cabrillo Courses lab.

Step 1: Keys in the Program Data

This query shows the problem with the program data:

select * from ProgramFile
	where `Program Control Number` = '';

Some records don’t have a control number. What can we do about it? Let’s delete the rows and re-create the program table with a key.

create table program ( 
    ControlNumber char(5) primary key,
    Title varchar(64),
    TOPCode char(6), 
    AwardType char(1), 
    CreditType char(1), 
    ApprovedDate date, 
    Status varchar(16), 
    InactiveDate date

# Truncates some dates that also have time of day. Ignore. 
insert into program (ControlNumber, Title, TOPCode, AwardType, CreditType, ApprovedDate, Status, InactiveDate)
	select `Program Control Number`, `Title`, `TOP Code`, `Program Award`, 
		`Credit Type`, IF (`Approved Date` = '', NULL, `Approved Date`), 
        TRIM(`Proposal Status`), IF (`Inactive Date` = '', NULL, `Inactive Date`)
	from ProgramFile
    where `Program Control Number` != '';

Note the where clause in the insert/select statement removes rows with empty control numbers.

Step 2: Keys in the Intersection Table

Now let’s fix the program_course table to use our new key. This too will encounter problems with empty control numbers. Run this query to check for bad keys:

select `Program Control Number`, `Course Control Number`, count(*) 
    from ProgramCourseFile 
    group by `Program Control Number`, `Course Control Number`
    having count(*) > 1;

There are courses that map to empty programs. Again, we’re going to discard this data because it’s hard to figure out where these courses belong.

create table program_course (
    ProgramControlNumber char(5), 
    CourseControlNumber char(12),
    constraint primary key (ProgramControlNumber, CourseControlNumber),
    constraint prog_course_fk_1 foreign key (ProgramControlNumber) 
        references program (ControlNumber),
    constraint prog_course_fk_2 foreign key (CourseControlNumber) 
        references course (ControlNumber)

insert into program_course (ProgramControlNumber, CourseControlNumber)
    select `Program Control Number`, `Course Control Number` 
    from ProgramCourseFile
    where `Program Control Number` != '';    

Turn In

Export your schema into a file called cabrillo_datamart_keys.sql and submit it on Canvas.