In preparation for writing your report to senior management next week, conduct the following descriptive statistics analyses with Microsoft Excel. Answer the questions below in your Microsoft Excel sheet or in a separate Microsoft Word document:
- Insert a new column in the database that corresponds to “Annual Sales.” Annual Sales is the result of multiplying a restaurant’s “SqFt.” by “Sales/SqFt.”
- Calculate the mean, standard deviation, skew, 5-number summary, and interquartile range (IQR) for each of the variables.
- Create a box-plot for the “Annual Sales” variable. Does it look symmetric? Would you prefer the IQR instead of the standard deviation to describe this variable’s dispersion? Why?
- Create a histogram for the “Sales/SqFt” variable. Is the distribution symmetric? If not, what is the skew? Are there any outliers? If so, which one(s)? What is the “SqFt” area of the outlier(s)? Is the outlier(s) smaller or larger than the average restaurant in the database? What can you conclude from this observation?
- What measure of central tendency is more appropriate to describe “Sales/SqFt”? Why?
Answer: (Below is the response/ answers to the above questions)
STATISTICAL DATA ANALYSIS 1
For the purpose of generating this statistical report, we will be using the Pastas R Us, Inc. database. Additionally, for the creation of the descriptive analysis, a spreadsheet was used. Therefore will be a separate attachment of the spreadsheet for clear explanations.
Descriptive Statistics-Calculation of the standard deviation, Mean, interquartile range (IQR), skew, and 5-number summary for each of the variables
For the purpose of calculating the interquartile range, the skew, means and standard deviation, there were applications to the SPSSS. The table posted below includes a screenshot of the excel output, which shows all the variables calculated.
The creation of a box-plot for the “Annual Sales” variable. Determining if it looks symmetric and deciding if the IQR method or Standard Deviation method would be best for describing the variables dispersion and, of course, why?
Box-plot for the Annual Sales
The image posted above shows the annual sales variable box-plot. However, there is a right-skewing of the data; hence it is not symmetrical. Since the data has no outliers, the interquartile range would not be applicable. Therefore the standard deviations would provide essential information regarding the dispersion of the variables.
From my personal experience and opinion, the standard deviation would be the most applicable because on the box plot, we have no outliers. The standard deviation ensures that all the variables within a dataset are considered and contribute to the total result.
Create a histogram for the “Sales/SqFt” variable. Was the distribution symmetric? If not, what is the skew for the dataset? Were there any outliers? If so, which ones? What is the “SqFt” area of the outliers? Is the outlier(s) larger or smaller than the average restaurant in the database? What can you conclude from this observation?
What measure of central tendency is more appropriate to describe “Sales/SqFt”? Why?
The application of the mean when making predictions will not be stable since there are no significant outliers. Therefore for this scenario, the application of median will have more weight. There is no symmetrical distribution on the histogram for the “sale/SqFt” variables.
The media =396.02
Because the mean is higher than the median, there will be a right skewing of the “sales/SqFt” variable.
The SqFt outlier areas will be 1092 (364 * 3)
From the database, we observe that the average is lower than the outliers.
Since the median is rarely affected by the outliers, it will be considered to be the most effective measure of central tendency. Despite mean is a measure of central tendency, it’s normally affected by the outliers. Mishra et al. (2019).
Mishra, P., Pandey, C. M., Singh, U., Gupta, A., Sahu, C., & Keshri, A. (2019). Descriptive statistics and normality tests for statistical data. Annals of cardiac anaesthesia, 22(1), 67. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6350423/
Professor’s comments: (Please see professor’s comments below and fix the problem)
Nice job on this assignment. Your box plot and histogram are perfect as are your values for annual sales and the descriptive statistics. Please read the following feedback so you can revise this assignment and resubmit it for full points.
I did take off for the following items:
(1) even though the box plot does not show any outliers, keep in mind that the data is skewed (asymmetrical). Skewed data occurs with or without outliers. Therefore, the preferred measure of central tendency would be the IQR, because the IQR tells us exactly where the data falls within the four quadrants of the distribution. We only use the Standard Deviation with symmetrical distributions (normal distributions.) Let me know if you have questions about that
(2) The histogram does have an outlier. It’s the far right column labeled as 948.56 sq ft. Make sure that if you revise this assignment for full points that you include the answers to the questions about the outlier (from the instructions) in your resubmission.