# Week 14

## Topics

We will cover the first half of chapter 11 this week.

• Section 11-1 : Graphing Data
• Section 11-2 : Measures of Central Tendency
• Section 11-3 : Measures of Dispersion

Normally, we would have HW 11 this week, and Exam 4 next week. However, the university frowns on exams during dead week. So, we will have the exam this week, and HW next week. This means that the material from this week will not be on the exam, but it will be on the final exam.

## Overview of the Sections

Section 11-1 deals with graphing data. You will learn how to read and produce vertical bar charts (column charts), horizontal bar charts, and pie charts. I assume that most of you already know how to read them. You need to learn how to produce these charts in Excel, but that is very easy. We have already done graphing in Excel in chapter 2. You do the same thing here, you just select a different kind of graph from the menu that Excel offers. This will be covered in class next week.

You will also learn how to divide large sets of numbers into classes. For example, you could take 100 test scores, decide where the grade cutoffs are, and divide the range of scores into 5 intervals corresponding to A, B, C, D, F. Counting the number of scores in each class produces a frequency table. When you plot the frequency table as a vertical bar chart, that is called a histogram. Excel actually has tools for producing frequency tables, but we won't use them in this class. You can skip the frequency polygons in section 8-2, but look at the cumulative frequency tables.

The other two sections we cover will be very important for any future statistics courses you may be taking. When a statistician looks at a frequency table or histogram, he or she usually asks two basic questions:

• Where is the approximate center of the distribution?
• How spread out is the distribution? Is there a narrow peak, or is the graph wide and flat?

Section 11-2 discusses three possible answers to the first question. The book calls them measures of central tendency. They are

• Mean: The Excel function to compute the mean is called "average", which pretty much explains what it is. If you replace the class intervals in a relative frequency table by their midpoints, you get a random variable (in the sense of section 8-5 ). The mean in this case is the same as the expected value from 8-5.
• Median: The median is the midpoint of the data: half the data points are below the median, half are above. A generalization of the median is a percentile (that is not in the book, but it doesn't hurt to know that). The 90th percentile, for example, is the number that is larger than 90% of the data, and smaller than 10% of the data. "Above the 90th percentile" means "in the top 10%". The median is the 50th percentile. The Excel function to compute the median is called "median".
• Mode: The mode is the data point that occurs the most frequently. There could be several modes, or none (if no data point is repeated). This is not used very much, it is mainly included for completeness. The Excel function to compute the mode is called "mode".

For mean and median there are two different formulas each: one for individual data points (ungrouped data), and one for grouped data. They are different, and you have to learn them both. The Excel functions can only handle ungrouped data. There are no built-in functions for grouped data, as far as I know.

You should check if your calculator has some built-in functions for computing mean and median. That will save you a lot of key punching, and cut down on the chance of making a mistake. Check whether your calculator can handle grouped data or not; mine can't.

Section 11-3 discusses two possible answers to the second question. The book calls them measures of dispersion. They are

• Range: The interval from the smallest to the largest data point. I don't think Excel has a function for that, but you can build your own from "max" and "min".
• Standard Deviation: Look at the book for the explanation. Make sure you pay attention to the difference between sample standard deviation and population standard deviation. In Excel, they are called "stdev" and "stdevp" . The variance is the square of the standard deviation, and again comes in two flavors.

As with mean and median, there are different formulas for ungrouped and grouped data. Again, Excel (and most likely your calculator) can only handle the ungrouped data.

## Exam 4

The exam covers the topics from chapters 7 and 8. The questions can all be done by hand or with a calculator, but you are welcome to use Excel during the exam if you want. If you have a calculator that can do combinations and permutations, or at least the factorial, that will be helpful. A calculator is needed for the exam.

Specifically, the following topics will be on the exam:

• Basic Counting: addition principle and multiplication principle; completing Venn diagrams
• Permutations and Combinations: how to calculate them, and doing word problems that lead to permutations and combinations
• Sample spaces, events: setting up sample spaces, computing probabilities by adding up probabilities of simple events
• Union of events: This is basically the same as the addition principle from basic counting, which becomes formula "P(AUB) = P(A) + P(B) - P(intersection of A and B)" in Section 8.2.
• Intersection of events: In section 8-2, this is done from scratch, by actually computing the set intersection and adding up the elementary probabilities in it. In section 8-3, it is done by applying the multiplication principle from basic counting, which becomes the product rule "P(intersection of A and B) = P(A)P(B|A) = P(B)P(A|B)" in Section 8.3.
• Complements of events: It is often useful in computing probabilities to look for the probability of the opposite event first
• Bayes' Formula: Instead of memorizing the formula, it may be easier to draw a tree or Venn diagram and read off the probabilities from that. You are using Bayes' Formula implicitly when you do that
• Be able to identify, compare and contrast the concepts of: mutually exclusive events, complementary events, independent events
• Definition of a random variable and probability distribution
• Compute the expected value of a random variable

You can skip the subsections related to odds and to empirical frequency.

## Assignments

• Take Practice exam 4
• Review the Exam Policies if necessary, and take Exam 4. The exam has a time limit of one hour.

If you want to work ahead, you can also start on HW 11

Last Updated: Wednesday, August 5, 2015