Saturday, September 15, 2012

OLS- It's nearly like OCD- in application

OLS- Ordinary Least Squares- It's not much on the surface but it's a great regression method of comparing the relationship between two variables.    It really takes someone nearly OCD, Over the top/ not Confused but nearly Deranged to accomplish it.   Starting with 29 US Census (2010) variables we deducted them one at a time to increase the r-squared values.   Figuring out the process was part of the battle.  There is no clear cut method. One can use python to do the reiterations but you still need to do the input for inductive reasoning to occur.  

Setting up an evaluation table would be a good thing.  I just used a hand table eliminating first those variables which had the closest zero coefficient, then probability and weighted the VIF on whether I wanted to remove it or not.   Still struggling in getting the results to copy/paste into excel but did use the copy/paste of print screen to a word doc then printed out for visual comparison. This method was a little bulky but got the job done. 

Here's a table for the revised run of A13 which I is the final result after the elimination of outlier census tracts.


Considering I had such frustration with understanding the general concept, which made me late on the assignment, only 21 OLS runs were made which was many less than others.    The original OLS run of all 29 were: Multiple r-squared  -53.61/Adjusted r-squared -119.60.   The final revised census layer without the most offensive outliers resulted in a OLS run : Multiple r-squared 0.65/Adjusted r-squared 0.47 (reflected in the table above)   I'm fairly pleased.   A map of the run shows below.     Additional studies including presence of roads, cities, schools, colleges, etc. were not part of this assignment although in a real case review may very well be major considerations of influence on the numbers.


No comments:

Post a Comment