Stat 6560 - Statistical Graphics and Visualization

3 Credits
TR 3:00-4:15
FAV 264


Instructor:
Michael Minnotte
Office: Lund 201-C
Phone: 797-2844
E-mail: minnotte@math.usu.edu
Office Hours: TR 9:30 - 10:20 or by appointment

Texts:
Tufte, Edward R., (1983) The Visual Display of Quantitative Information, Cheshire, Connecticut, Graphics Press.
Cleveland, William S., (1993), Visualizing Data , Summit, New Jersey, Hobart Press.
S-plus scripts for figures
Data in S-plus format


Announcements

No class November 14 or 21 (office hours for help with projects instead) or 26. Student presentations (project 7) will be December 3 and 5.


Statistical graphics and data visualization are critical elements of modern data analysis and presentation. From initial exploration of a data set to the final presentation of results to the end user, graphics play a vital role in shaping our understanding of our data. Through proper use of graphics, we can make critical discoveries, and communicate them clearly. Conversely, poor use or misuse of graphics can seriously mislead (by accident or design).

In this course, we will start with presentation graphics, including discussion of both tools and principles which lead to clear communication and those which serve only to confuse or mislead. We will spend most of the semester in exploratory graphics and data analysis, including data mining. This will be broken down largely by the dimension of the applicable data. One- and two-dimensional datasets require and allow far different methods than those of more than three dimensions. Categorical and regression data call for their own specialized methods.

Even more than most aspects of statistics, graphics and visualization involve art as well as science. In most cases, there are many reasonable approaches. But an understanding of the options available and the underlying priciples will lead to successful analysis and presentation.


Prerequisites: I will not enforce any specific prerequisites. A course or two in traditional statistical analysis would be very helpful, as would prior computing experience, but neither is required. Previous experience with S-plus or R (statistical computing packages) will give a head start, but again is not necessary.

Other Sources: Beyond the required texts, material will be drawn from a number of sources. Some additional useful references are:
Cleveland, William S., (1994), The Elements of Graphing Data, Summit, NJ, Hobart Press.
Cleveland, William S. and McGill, Marylyn E., eds., (1988), Dynamic Graphics for Statistics, Belmont, CA, Wadsworth & Brooks/Cole.
Cook, R. Dennis and Weisberg, Sanford (1994), An Introduction to Regression Graphics, New York, Wiley.
Deboeck, Guido and Kohonen, Teuvo, eds., (1998), Visual Explorations in Finance with Self-Organizing Maps, New York, Springer.
du Toit, S.H.C., Steyn, A.G.W., and Stumpf, R.H., (1986), Graphical Exploratory Data Analysis, New York, Springer-Verlag.
Harris, Robert L., (1999), Information Graphics, New York, Oxford University Press.
Henry, Gary T., (1995), Graphing Data: Techniques for Display and Analysis, Thousand Oaks, CA, SAGE Publications.
Tufte, Edward R., (1990), Envisioning Information, Cheshire, CT, Graphics Press.
Tufte, Edward R., (1997), Visual Explanations: Images and Quantities, Evidence and Narritive, Cheshire, CT, Graphics Press.
Tukey, John W., (1977), Exploratory Data Analysis, Reading, MA, Addison-Wesley.
Wainer, Howard, (1997), Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot, New York, Springer-Verlag.
Wallgren, A., Wallgren, B., Persson, R., Jorner, U., and Haaland, J., (1996), Graphing Statistics and Data: Creating Better Charts, Newbury Park, CA, SAGE Publications.
Westphal, Christopher and Blaxton, Teresa, (1998), Data Mining Solutions: Methods and Tools for Solving Real-World Problems, New York, Wiley.
Wilkinson, Leland, (1999) The Grammar of Graphics, New York, Springer.


Software

We will primarily be using R, a GNU-license statistical package and clone of S-plus. We will also be using GGobi for high-dimensional analysis. Both are available on the web for free download and can be used on Unix or Windows systems.

I will also provide a number of functions designed to be used in R. These will be posted here. You will need to download these functions, then pull them into R using the "source" command, after which you will be able to use these functions in your own analyses.

R and GGobi Sites
The Comprehensive R Archive Network
Windows R Setup Executable Download
R Frequently Asked Questions (FAQ) List
R for Beginners (58 page pdf file)
An Introduction to R (100 page pdf file)
Data Analysis and Graphics Using R -- An Introduction (112 page pdf file)
Download site for GGobi

R Graphics Functions

Univariate empirical cdf and kernel density estimate R functions
Mode tree R functions
Mode forest R functions
SiZer R functions
Multiple kde's broken down by a factor R function
Pairwise quantile-quantile plot R function
Tukey mean-difference plot R function
One-way model fitting and residual-fit spread plot R function
Univariate symmetry plot R function
Bivariate kde R function
Local polynomial regression R functions
3-D scatterplot and spin R functions
Trivariate local polynomial regression R functions
Andrews curve R function
Parallel coordinate plot R function
Data image R function


Assignments

There will be a variety of assignments throughout the quarter. Each assignment will include a value (typically 20-100 points) that it will be scored out of. Your final grade will be determined by the sum of your points in all assignments. Some assignments will include combinations of analysis of existing graphics, creation of your own, computer work (mostly in S-plus, but some other packages/languages as well), and short oral presentations. The value of each assignment should be roughly proportional to its importance and the amount of work involved. Assignments will be handed out in class and posted here.

Assignment 1
Assignment 2
Assignment 3 - data3.1, data3.2, data3.3
Assignment 4 - eggs.dat
Assignment 5 - corn.dat, geyser.dat, orthodont.dat
Assignment 6
Assignment 7
Assignment 8


Data

chondrite data
stamp thickness data
singer height data
melanoma rate data
ethanol data
high-dimensional data


Other Class-Related Links

Graphical Maps Links (J. Symanzik)


If a student has a disability that will likely require some accomodation by the instructor, the student must contact the instructor and document the disability through the Disability Resource Center, preferably during the first week of the course. Any requests for special considerations relating to attendance, pedagogy, taking of examinations, etc. must be discussed with and approved by the instructor. In cooperation with the Disability Resource Center, course materials can be provided in alternative formats - large print, audio, diskette or Braille.


Disclaimer: The instructor reserves the right to alter anything about this course, pretty much on whim (but he probably won't).


Return to Mike Minnotte's home page.
Last updated: November 12, 2002