Data Wrangling with R

1 minute read

Overview

Red Wine Dataset - This tidy data set contains 1,599 red wines with 11 variables on the chemical properties of the wine. At least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent). I use R to assess which chemical properties influence the quality of red wines.

Insights:

  1. Observation - 95% of the wines are rated between 5-7 quality. So most of the observations and findings will be applicable to this quality range. We don’t have enough observations for the wines with other quality ratings
  2. Insight - Based on the correlation coefficient and the line charts (median attributes vs quality), citric acid, alcohol and sulphates seem to have the most positive affect on quality whereas volatile acidity seems to have the most negative affect on quality
  3. Struggle - Only the output variable, quality ahd limited values and could be converted into a categorical variable. That limited the kind of graphs and charts I could create, since all other variables are continuous
  4. Surprising - It was surprising to observe that the median sulphates and median sulfur dioxide have a different relationship with quality. While quality increases consistently with increase in median sulphates, quality would first increase and then decrease with increase in sulfur dioxide
  5. Future work - Having more data points for wines rated between 1-4 and wines rated greater than 7, might illuminate some other trends.

Updated: