Keys in Real Data: Cabrillo Courses¶
In this lab we will examine some trade-offs when addressing problems with data. This lab builds on the schema that we imported in the Using Real Data: Cabrillo Courses lab.
Step 1: Keys in the Program Data¶
This query shows the problem with the program data:
select * from ProgramFile where `Program Control Number` = '';
Some records don’t have a control number. What can we do about it? Let’s delete the rows and re-create the program table with a key.
create table program ( ControlNumber char(5) primary key, Title varchar(64), TOPCode char(6), AwardType char(1), CreditType char(1), ApprovedDate date, Status varchar(16), InactiveDate date ); # Truncates some dates that also have time of day. Ignore. insert into program (ControlNumber, Title, TOPCode, AwardType, CreditType, ApprovedDate, Status, InactiveDate) select `Program Control Number`, `Title`, `TOP Code`, `Program Award`, `Credit Type`, IF (`Approved Date` = '', NULL, `Approved Date`), TRIM(`Proposal Status`), IF (`Inactive Date` = '', NULL, `Inactive Date`) from ProgramFile where `Program Control Number` != '';
where clause in the insert/select statement removes rows with empty control numbers.
Step 2: Keys in the Intersection Table¶
Now let’s fix the
program_course table to use our new key. This too will encounter problems with empty control numbers. Run this query to check for bad keys:
select `Program Control Number`, `Course Control Number`, count(*) from ProgramCourseFile group by `Program Control Number`, `Course Control Number` having count(*) > 1;
There are courses that map to empty programs. Again, we’re going to discard this data because it’s hard to figure out where these courses belong.
create table program_course ( ProgramControlNumber char(5), CourseControlNumber char(12), constraint primary key (ProgramControlNumber, CourseControlNumber), constraint prog_course_fk_1 foreign key (ProgramControlNumber) references program (ControlNumber), constraint prog_course_fk_2 foreign key (CourseControlNumber) references course (ControlNumber) ); insert into program_course (ProgramControlNumber, CourseControlNumber) select `Program Control Number`, `Course Control Number` from ProgramCourseFile where `Program Control Number` != '';
Export your schema into a file called
cabrillo_datamart_keys.sql and submit it on Canvas.