A Data Set
We will use an artificially generated pseudo data set fakeRentPop.csv as a running example throughout the book. The data set contains information about an artificial population of students’ rent expenses. The variables collected in the data set corresponds to columns in the data set, namely
rent: monthly rent expense of the student, which is continuous.study: indicates whether the level of study of the student, which can be either"Undergrad","Grad","Med Residents"or"Non-degree". This variable is categorical.college: the college affiliation of the student. It can be either"Agriculture & Bioresources","Arts and Science","Education","Nursing","Kinesiology","Engineering","Public Health","Pharmacy & Nutrition","Medicine","Veterinary Medicine","Business","Law","Public Policy","Environment & Sustainability","Dentistry", or"Others". This variable is categorical.origin: the origin of the student. It can be either"Saskatchewan","Out of Province", or"International". This variable is categorical.distance: the distance from the rented place to school. It can be either"less than 15 minutes","15-30 minutes","30-45 minutes", or"more than 40 minutes". This variable is categorical.area: the area of the rented place in \(m^2\). This variable is continuous.washroom: whether the rented place has a private washroom.1indicates there is a private washroom and0otherwise. This variable is binary.neighborhood: neighborhood code of the rented place. This variable is categorical.course: the course that the student attend. This variable is categorical.
To read the data set and save it into R working environment as an object called rentPop, use the following code
This rentPop object will be used throughout the book.