Jonathan Falcon

Americans' Intent to Install Solar Panels

My marketing analytics team's project that aimed to understand Americans' intent to install solar panels. We cleaned data, utilized linear and logistic regression, created segments based on attitudinal variables using k-means clustering, and presented our findings to the class.

Background

The Dataset

The dataset for this project is the UT Energy Poll. Now discontinued, the biannual survey run by the Energy Institute at the University of Texas at Austin started in 2011. The dataset contained a sample of 921 respondents designed to be representative of U.S. households.

The Research Question

With that dataset, we asked, “Can Americans’ intention to install solar panels be predicted using attitudes and demographics?”

About the Data

Before we dove into the dataset, we tasked ourselves with first understanding who the data is about. How old are they? How do they identify? How much do they make?

Gender

Here, we can see that roughly 43% of the respondents in the sample identified as female and 57% as male.

Slide 6, showing the breakdown of responsdents' gender.

Age

Here, we can see that most respondents were in the 65–74 age range. Following this is the 18–24 age range.

Slide 7, showing the breakdown of responsdents' age.

Income

Here, we can see that most respondents had an annual income in the $50K–$75K range.

Slide 8, showing the breakdown of responsdents' income.

Example Survey Questions

The following categories of questions pertaining to attitudes were the basis on which we created segments with k-means clustering.

Concern

Please rate how concerned you are about the cost of electricity.

Satisfaction

Please indicate how satisfied you are with the job renewable energy companies are doing to address the energy issues that are most important to you.

Likelihood

Please indicate how likely it is that you will install solar panels at your home within the next 5 years.

Analysis

Variables in Use

Variable NameVariable Description
Q160_8Intent to install solar panels from 1 to 7
intent_solarPanelRecoded Q160_8 where all values equal to or above 6 as 1s and below 6 as 0s
ageCategorical age from 1 to 8
genderBinary gender with 0 being male and 1 being female
incomeCategorical income from 1 to 8
age*incomeInteraction between age and income
satisfied_gov_addressEnergyIssuesMean response of Q110_1, Q110_2, and Q110_3
satisfied_ngo_addressEnergyIssuesMean response of Q110_22, Q110_23, Q110_24, Q110_25, and Q110_26

Linear Regression Model

We started with the basics: a linear regression model. Here, we use Q160_8 as the dependent variable. This model provided a poor fit, only explaining about 9.7% of the variance in intent to install solar panels. Notably, age and income were significant predictors though and in alignment with natural assumptions that those who are older or make more money would be more likely display intent to install solar panels.

Slide 13, detailing the results of the linear regression model.

Finding the results of this model to be poor, we moved on to a logistic regression model.

Logistic Regression Model

Before a logistic regression would be possible, we’d need intent to install solar panels to be a binary variable, i.e., a one or zero—in this case, a yes or no. How did we do this?

Top Two Box Score (T2B)

Similar to how Net Promoter Score (NPS) determines promoters and detractors, a top two box score takes the top two responses (six and seven) and summarizes them as displaying intent to install solar panels. Responses with five and below are then categorized as not displaying intent to install solar panels. The picture below illustrates our process of recoding.

Illustration of recoding of intent to install solar panels using top two box score

Results

Now using our binary variable intent_solarPanel (the transformed Q160_8), we came up with the following model. Additionally, we wanted to capture any interaction between age and income, thus the use of interact_ageIncome.

Slide 16, detailing the results of the logistic regression model.

Based on the results, we noticed:

  • Gender was still not a significant predictor.
  • Age became an insignificant predictor.
  • Income is positively associated with intent to solar panels.
  • There is a negative interaction term between age and income.

Regarding the negative interaction term (interact_ageIncome), this suggests a diminishing effect for income, implying that for two equal increases in income, the impact on the intent to install solar panels will, on average, be higher for a younger respondent.

But we weren’t satisfied just yet, so we moved on to k-means clustering.

K-Means Clustering

Incorporating Attitudinal Variables

Before we could cluster, we needed to incorporate the attitudinal variables to ensure that we can make a meaningful comparison between the regression model/models before and after clustering.

The two variables we added were the satisfaction with how environmental NGOs and also the government are addressing energy issues—(satisfied_ngo_addressEnergyIssues) and (satisfied_gov_addressEnergyIssues), respectively.

Slide 19, detailing the results of incorporating attitudinal variables.

We added these variables to the model and found:

  • Both income and the interaction between age and income remained significant.
  • Both income and the interaction between age and income’s magnitude remained nearly the same.
  • That satisfaction with how the government is addressing energy issues is not significant.
  • That satisfaction with how environmental NGOs are addressing energy issues is significant and is positively associated with intention to install solar panels.

Clustering

We then used k-means clustering to segment the respondents based on the attitudinal variables.

Slide 19, detailing the results of the k-means clustering.

We found that the optimal number of clusters was three. The clusters were as follows:

  1. Cluster 1: Relatively low satisfaction with both government and environmental NGOs addressing energy issues.
  2. Cluster 2: Relatively high satisfaction with both government and environmental NGOs addressing energy issues.
  3. Cluster 3: Relatively high satisfaction with environmental NGOs addressing energy issues and relatively low satisfaction with government addressing energy issues.

Results

After deciding on three clusters, we then ran a logistic regression model for each cluster.

Slide 21, detailing the results of the each cluster's respective regression model.

We found that there were few statistically significant differences between the clusters, and overall, clustering provided little in the way of meaningful results.

Conclusion

Answering the Research Question

We found that some demographics (e.g., income) may predict Americans’ intent to install solar panels better than attitudes, though attitudes are not perfect predictors of intent. We also found that clustering based on attitudes did not provide meaningful results.

Thus, the best model to predict Americans’ intent to install solar panels was the logistic regression model that incorporated attitudinal variables.

Using the Model

Slide 23, detailing the suggestions for using the model.

Based on our findings, we suggested that if the cost of acquiring a customer is low, then this model may be worth using and carries little risk. However, if the cost is high, then the model may not be worth using, as it carries a high risk.