We will cover the
first half of chapter 11 this week.
- Section 11-1
: Graphing Data
- Section 11-2
: Measures of Central Tendency
- Section 11-3
: Measures of Dispersion
Normally, we would have HW 11 this week, and Exam 4 next week. However, the university frowns on exams during dead week. So, we will have the exam this week, and HW next week. This means that the material from this week will not be on the exam, but it will be on the final exam.
Overview of the Sections
deals with graphing data. You will learn how to read and produce vertical
bar charts (column charts), horizontal bar charts,
and pie charts. I assume that most of you already know how
to read them. You need to learn how to produce these charts in Excel, but that
is very easy. We have already done graphing in Excel in chapter 2. You do the
same thing here, you just select a different kind of graph from the menu that
Excel offers. This will be covered in class next week.
You will also learn how to divide large sets of numbers into classes.
For example, you could take 100 test scores, decide where the grade cutoffs
are, and divide the range of scores into 5 intervals corresponding to A, B,
C, D, F. Counting the number of scores in each class produces a frequency
table. When you plot the frequency table as a vertical bar chart,
that is called a histogram. Excel actually has tools for producing
frequency tables, but we won't use them in this class. You can skip the frequency polygons in section 8-2, but look at the cumulative
The other two sections we cover will be very important for any future statistics
courses you may be taking. When a statistician looks at a frequency table or
histogram, he or she usually asks two basic questions:
- Where is the approximate center of the distribution?
- How spread out is the distribution? Is there a narrow peak, or is the graph
wide and flat?
discusses three possible answers to the first question. The book calls them measures
of central tendency. They are
- Mean: The Excel function to compute the mean is called "average",
which pretty much explains what it is. If you replace the class intervals
in a relative frequency table by their midpoints, you get a random variable
(in the sense of section 8-5
). The mean in this case is the same as the expected value from 8-5.
- Median: The median is the midpoint of the data: half the data points
are below the median, half are above. A generalization of the median is a percentile (that
is not in the book, but it doesn't hurt to know that). The 90th percentile,
for example, is the number that is larger than 90% of the data, and smaller than 10% of the data. "Above the
90th percentile" means "in the top 10%". The median is the 50th percentile.
The Excel function to compute the median is called
- Mode: The mode is the data point that occurs the most frequently.
There could be several modes, or none (if no data point is repeated). This
is not used very much, it is mainly included for completeness. The Excel
function to compute the mode is called "mode".
For mean and median there are two different formulas each: one for individual
data points (ungrouped data), and one for grouped data. They are different,
and you have to learn them both. The Excel functions can only handle ungrouped
data. There are no built-in functions for grouped data, as far as I know.
You should check if your calculator has some built-in functions for computing
mean and median. That will save you a lot of key punching, and cut down on
the chance of making a mistake. Check whether your calculator can handle grouped
data or not; mine can't.
discusses two possible answers to the second question. The book calls them measures
of dispersion. They are
- Range: The interval from the smallest to the largest data point.
I don't think Excel has a function for that, but you can build your own from "max" and "min".
- Standard Deviation: Look at the book for the explanation. Make sure you pay attention to the difference between sample standard deviation and population
In Excel, they are called "stdev" and "stdevp" . The variance is the square
of the standard deviation, and again comes in two flavors.
As with mean and median, there are different formulas for ungrouped and grouped data. Again,
Excel (and most likely your calculator) can only handle the ungrouped data.
The exam covers the topics from chapters 7 and 8. The questions can all be done by hand or with a calculator, but you are welcome
to use Excel during the exam if you want. If you have a calculator that can
do combinations and permutations, or at least the factorial, that will be helpful.
A calculator is needed for the exam.
Specifically, the following topics will be on the exam:
- Basic Counting: addition principle and multiplication principle;
completing Venn diagrams
- Permutations and Combinations: how to calculate them, and doing
word problems that lead to permutations and combinations
- Sample spaces, events: setting up sample spaces, computing probabilities
by adding up probabilities of simple events
- Union of events: This is basically the same as the addition principle
from basic counting, which becomes formula "P(AUB) = P(A) + P(B) - P(intersection of A and B)" in Section 8.2.
- Intersection of events: In section 8-2, this is done from scratch, by actually computing the set intersection and
adding up the elementary probabilities in it. In section 8-3, it is done by applying the multiplication principle from basic counting,
which becomes the product rule "P(intersection of A and B) = P(A)P(B|A) = P(B)P(A|B)" in Section 8.3.
- Complements of events: It is often useful in computing probabilities
to look for the probability of the opposite event first
- Bayes' Formula: Instead of memorizing the formula, it may be easier
to draw a tree or Venn diagram and read off the probabilities from that.
You are using Bayes' Formula implicitly when you do that
- Be able to identify, compare and contrast the concepts of: mutually
exclusive events, complementary events, independent events
- Definition of a random variable and probability distribution
- Compute the expected value of a random variable
You can skip the subsections related to odds and to empirical frequency.
- Take Practice exam 4
- Review the Exam Policies if necessary, and take Exam
4. The exam has a time limit of one hour.
If you want to work ahead, you can also start on HW 11
Wednesday, August 5, 2015