Instructor:
Michael Minnotte
Office: Lund 201-C
Phone: 797-2844
E-mail: minnotte@math.usu.edu
Office Hours: TR 9:30 - 10:20 or by appointment
Texts:
Tufte, Edward R., (1983) The Visual Display of
Quantitative Information, Cheshire, Connecticut, Graphics Press.
Cleveland, William S., (1993), Visualizing Data ,
Summit, New Jersey, Hobart Press.
S-plus scripts
for figures
Data in S-plus
format
Statistical graphics and data visualization are critical
elements of modern data analysis and presentation. From initial
exploration of a data set to the final presentation of results to
the end user, graphics play a vital role in shaping our understanding
of our data. Through proper use of graphics, we can make critical
discoveries, and communicate them clearly. Conversely, poor use
or misuse of graphics can seriously mislead (by accident or design).
In this course, we will start with presentation graphics,
including discussion of both tools and principles which lead to
clear communication and those which serve only to confuse or mislead.
We will spend most of the semester in exploratory graphics and data
analysis, including data mining. This will be broken down largely
by the dimension of the applicable
data. One- and two-dimensional datasets require and allow far different
methods than those of more than three dimensions. Categorical and
regression data call for their own specialized methods.
Even more than most aspects of statistics, graphics and visualization
involve art as well as science. In most cases, there are many reasonable
approaches. But an understanding of the options available and the
underlying priciples will lead to successful analysis and presentation.
Prerequisites: I will not enforce any specific prerequisites.
A course or two in traditional statistical analysis would be very
helpful, as would prior computing experience, but neither is required.
Previous experience with S-plus or R (statistical computing packages) will
give a head start, but again is not necessary.
Other Sources:
Beyond the required texts, material will be drawn from a number
of sources. Some additional useful references are:
Cleveland, William S., (1994), The Elements of Graphing Data,
Summit, NJ, Hobart Press.
Cleveland, William S. and McGill, Marylyn E., eds., (1988), Dynamic
Graphics for Statistics, Belmont, CA, Wadsworth & Brooks/Cole.
Cook, R. Dennis and Weisberg, Sanford (1994), An Introduction to
Regression Graphics, New York, Wiley.
Deboeck, Guido and Kohonen, Teuvo, eds., (1998), Visual Explorations
in Finance with Self-Organizing Maps, New York, Springer.
du Toit, S.H.C., Steyn, A.G.W., and Stumpf, R.H., (1986), Graphical Exploratory Data Analysis, New York, Springer-Verlag.
Harris, Robert L., (1999), Information Graphics, New York, Oxford
University Press.
Henry, Gary T., (1995), Graphing Data: Techniques for Display and
Analysis, Thousand Oaks, CA, SAGE Publications.
Tufte, Edward R., (1990), Envisioning Information, Cheshire, CT,
Graphics Press.
Tufte, Edward R., (1997), Visual Explanations: Images and Quantities,
Evidence and Narritive, Cheshire, CT, Graphics Press.
Tukey, John W., (1977), Exploratory Data Analysis, Reading, MA,
Addison-Wesley.
Wainer, Howard, (1997), Visual Revelations: Graphical Tales of Fate
and Deception from Napoleon Bonaparte to Ross Perot, New York,
Springer-Verlag.
Wallgren, A., Wallgren, B., Persson, R., Jorner, U., and Haaland, J., (1996),
Graphing Statistics and Data: Creating Better Charts, Newbury Park,
CA, SAGE Publications.
Westphal, Christopher and Blaxton, Teresa, (1998), Data Mining Solutions:
Methods and Tools for Solving Real-World Problems, New York, Wiley.
Wilkinson, Leland, (1999) The Grammar of Graphics, New York, Springer.
We will primarily be using R, a GNU-license statistical package and clone of S-plus. We will also be using GGobi for high-dimensional analysis. Both are available on the web for free download and can be used on Unix or Windows systems.
I will also provide a number of functions designed to be used in R. These will be posted here. You will need to download these functions, then pull them into R using the "source" command, after which you will be able to use these functions in your own analyses.
R and GGobi Sites
The Comprehensive R Archive Network
Windows R Setup Executable Download
R Frequently Asked Questions (FAQ) List
R for Beginners (58 page pdf file)
An Introduction to R (100 page pdf file)
Data Analysis and Graphics Using R -- An Introduction (112 page pdf file)
Download site for GGobi
R Graphics Functions
Univariate empirical cdf and kernel density estimate R functions
Mode tree R functions
Mode forest R functions
SiZer R functions
Multiple kde's broken down by a factor R function
Pairwise quantile-quantile plot R function
Tukey mean-difference plot R function
One-way model fitting and residual-fit spread plot R function
Univariate symmetry plot R function
Bivariate kde R function
Local polynomial regression R functions
3-D scatterplot and spin R functions
Trivariate local polynomial regression R functions
Andrews curve R function
Parallel coordinate plot R function
Data image R function
There will be a variety of assignments throughout the quarter.
Each assignment will include a
value (typically 20-100 points) that it will be scored out of. Your
final grade will be determined by the sum of your points in all assignments.
Some assignments will include combinations of analysis of existing
graphics, creation of your own, computer work (mostly in S-plus, but some
other packages/languages as well), and short oral presentations. The
value of each assignment should be roughly proportional to its importance
and the amount of work involved. Assignments will be handed out in
class and posted here.
Assignment 1
Assignment 2
Assignment 3 -
data3.1,
data3.2,
data3.3
Assignment 4 -
eggs.dat
Assignment 5 -
corn.dat,
geyser.dat,
orthodont.dat
Assignment 6
Assignment 7
Assignment 8
chondrite data
stamp thickness data
singer height data
melanoma rate data
ethanol data
high-dimensional data
Graphical Maps Links (J. Symanzik)
If a student has a disability that will likely require some accomodation by
the instructor, the student must contact the instructor and document the
disability through the Disability Resource Center, preferably during the
first week of the course. Any requests for special considerations
relating to attendance, pedagogy, taking of examinations, etc. must be
discussed with and approved by the instructor. In cooperation with the
Disability Resource Center, course materials can be provided in alternative
formats - large print, audio, diskette or Braille.
Disclaimer: The instructor reserves the right to alter anything about this course, pretty much on whim (but he probably won't).