Project Description

Our project involves the comparison of housing between California and Texas. This topic was chosen because our group was interested in seeing the comparison of certain housing aspects between two populous states. By doing so, we would be able to note the differences and similarities between both states in their housing. Certain people that may use our project include people interested in seeing attributes of houses and how they differ for each U.S state, along with individuals who have an interest in real estate. These people may use this project because it will allow them to analyze the visualizations provided by the project and help them make their own insights on how two states differ in the real estate field. Such aspects that will be compared include the values and incomes of these households, the race and ethnicities of the homeowners, the education background of the homeowners, etc.

Data Sets Description

All of our datasets for the project come from the United States Census Bureau Data Page. In this page, users are able to search for a topic of their interest on the U.S population in certain aspects. In this case, our group searched for housing datasets and were able to find CSV files pertaining to certain subtopics. Such subtopics include data on financial, racial, and physical characteristics of the housing within certain states. We were able to find the data for our states within the subtopics we wanted to analyze for the year 2022. All of these files are in CSV form, meaning that all our data cleaning consisted of using the CSV library for the CSV reader in order to create DataFrames with our data.

Sub Topics

...
Analyze the economic aspects of housing in Texas and California. Such aspects include household value, income, and ratio of income to value.

Economics Page

...
Analyze the demographics of the household owners within Texas and California. Such demographics include race, education background, and age.

Demographics Page

...
Analyze the household aspects of the houses in Texas and California. Such aspects include the number of vehicles and rooms per house, along with the year ranges these houses where built in.

Household Aspects Page

Data to Visualization Process

The data to visualization process for our project was extensive. It began with cleaning up the CSV files because of the whitespace that they contained. In addition, the values of all the CSV files where strings, which meant that we needed to write a program for those strings to be turned into floats in order for our visualizations to present our data correctly. All the CSV files had their own DataFrames so that no confusion would arise in attempting to combine all the data into one huge DataFrame. From each of these DataFrames, we constructed smaller DataFrames for our data to be visualized. The most challenging aspect of the project was the cleaning of our data. With the columns being formated in a unique way and our values being strings, we had to write a lot of code just to clean that part of our data up. One of the most rewarding aspects of the project was being able to get our data to visualize. With all the code going into cleaning our data and constructing DataFrames out of it, it would be relieving to be able to see a visualization pop up.