Instructor:
Michael Minnotte
Office: Lund 201-C
Phone: 797-2844
E-mail: Mike.Minnotte@usu.edu
Office Hours: TR 9:30 - 10:20, W 10:30 - 11:20, or by appointment
Text:
Tufte, Edward R., (1983) The Visual Display of
Quantitative Information, Cheshire, Connecticut, Graphics Press.
Cleveland, William S., (1993), Visualizing Data ,
Summit, New Jersey, Hobart Press.
S-plus scripts
for figures
Data in S-plus
format
I will post announcements here. Please check regularly, especially if you must miss class for any reason.
Statistical graphics and data visualization are critical
elements of modern data analysis and presentation. From initial
exploration of a data set to the final presentation of results to
the end user, graphics play a vital role in shaping our understanding
of our data. Through proper use of graphics, we can make critical
discoveries, and communicate them clearly. Conversely, poor use
or misuse of graphics can seriously mislead (by accident or design).
In this course, we will start with presentation graphics,
including discussion of both tools and principles which lead to
clear communication and those which serve only to confuse or mislead.
We will spend most of the semester in exploratory graphics and data
analysis, including data mining. This will be broken down largely
by the dimension of the applicable
data. One- and two-dimensional datasets require and allow far different
methods than those of more than three dimensions. Categorical and
regression data call for their own specialized methods.
Even more than most aspects of statistics, graphics and visualization
involve art as well as science. In most cases, there are many reasonable
approaches. But an understanding of the options available and the
underlying priciples will lead to successful analysis and presentation.
Prerequisites: I will not enforce any specific prerequisites.
A course or two in traditional statistical analysis would be very
helpful, as would prior computing experience, but neither is required.
Previous experience with S-plus or R (statistical computing packages) will
give a head start, but again is not necessary.
Other Sources:
Beyond the required texts, material will be drawn from a number
of sources. Some additional useful references are:
Cleveland, William S., (1994), The Elements of Graphing Data,
Summit, NJ, Hobart Press.
Cleveland, William S. and McGill, Marylyn E., eds., (1988), Dynamic
Graphics for Statistics, Belmont, CA, Wadsworth & Brooks/Cole.
Cook, R. Dennis and Weisberg, Sanford (1994), An Introduction to
Regression Graphics, New York, Wiley.
Deboeck, Guido and Kohonen, Teuvo, eds., (1998), Visual Explorations
in Finance with Self-Organizing Maps, New York, Springer.
du Toit, S.H.C., Steyn, A.G.W., and Stumpf, R.H., (1986), Graphical Exploratory Data Analysis, New York, Springer-Verlag.
Harris, Robert L., (1999), Information Graphics, New York, Oxford
University Press.
Henry, Gary T., (1995), Graphing Data: Techniques for Display and
Analysis, Thousand Oaks, CA, SAGE Publications.
Murrell, Paul, (2005), R Graphics, London, Chapman & Hall.
Playfair, William, (2005), The Commercial and Political Atlas and
Statistical Breviary, New York, Cambridge University Press. (First
published in 1786-1801!)
Robbins, Naomi B., (2005), Creating More Effective Graphs, Hoboken, NJ, Wiley.
Tufte, Edward R., (1990), Envisioning Information, Cheshire, CT,
Graphics Press.
Tufte, Edward R., (1997), Visual Explanations: Images and Quantities,
Evidence and Narritive, Cheshire, CT, Graphics Press.
Tukey, John W., (1977), Exploratory Data Analysis, Reading, MA,
Addison-Wesley.
Unwin, A., Theus, M., and Hofmann, H., (2006), Graphics of Large Datasets:
Visualizing a Million, New York, Springer.
Wainer, Howard, (1997), Visual Revelations: Graphical Tales of Fate
and Deception from Napoleon Bonaparte to Ross Perot, New York,
Springer-Verlag.
Wallgren, A., Wallgren, B., Persson, R., Jorner, U., and Haaland, J., (1996),
Graphing Statistics and Data: Creating Better Charts, Newbury Park,
CA, SAGE Publications.
Westphal, Christopher and Blaxton, Teresa, (1998), Data Mining Solutions:
Methods and Tools for Solving Real-World Problems, New York, Wiley.
Wilkinson, Leland, (1999) The Grammar of Graphics, New York, Springer.
We will primarily be using R, a GNU-license statistical package and clone of S-plus. We will also be using GGobi for high-dimensional analysis. Both are available on the web for free download and can be used on Unix or Windows systems.
I will also provide a number of functions designed to be used in R. These will be posted here. You will need to download these functions, then pull them into R using the "source" command, after which you will be able to use these functions in your own analyses.
Graphical Statistics Software Sites
The Comprehensive R Archive Network
Windows R Setup Executable Download - click on rwXXXX.exe, where XXXX gives the version number
R Frequently Asked Questions (FAQ) List
R for Beginners (76 page pdf file)
An Introduction to R (100 page pdf file)
Data Analysis and Graphics Using R -- An Introduction (112 page pdf file)
Download site for GGobi
Download site for Mondrian
Note: If you are using R on campus, you will probably need to execute the following commands to get through the firewall to install additional packages:
Sys.putenv("http_proxy"="http://proxy.usu.edu:80/")
Sys.getenv("http_proxy")
You will probably need to run these commands (on the R console) at the beginning of your session before trying to install packages.
R Graphics Functions
Demonstration kernel density estimate R function
Mode tree R functions
Mode forest R functions
SiZer R functions
Multiple kde's broken down by a factor R function
Bivariate histogram function
RGL Perspective Plot R function
Local polynomial regression R functions
Trivariate kernel density estimation and plotting R functions
Trivariate local polynomial regression R functions
Andrews curve R function
Data image R function
In-Class R Example Commands
Categorical Data Visualization
-
With Output (Word)
Univariate Data Visualization
-
With Output (Word)
Scale Space Visualization and
Univariate Data Visualization with a Categorical Covariate
-
With Output (Word)
Univariate Data Transformation
-
With Output (Word)
Bivariate Data - Scatterplots
-
With Output (Word)
Bivariate Histograms
-
With Output (Word)
Bivariate Kernel Density Estimation and
Nonparametric Regression
-
With Output (Word)
Maps (including Choropleth and Micro-) in R
-
With Output (Word)
Trivariate Data - Scatterplots
-
With Output (Word)
Trivariate Data - Density Estimation and Nonparametric Regression
-
With Output (Word)
Computations Related to High-Dimensional Geometry
-
With Output (Word)
Static Hypervariate Plots -
With Output (Word)
Dimension Reduction (Principal Components, Projection Pursuit) -
With Output (Word)
There will be a variety of assignments throughout the quarter.
Each assignment will include a
value (typically 20-100 points) that it will be scored out of. Your
final grade will be determined by the sum of your points in all assignments.
Some assignments will include combinations of analysis of existing
graphics, creation of your own, computer work (mostly in S-plus, but some
other packages/languages as well), and short oral presentations. The
value of each assignment should be roughly proportional to its importance
and the amount of work involved. Assignments will be handed out in
class and posted here.
Assignment 1
Assignment 2
Assignment 3
Assignment 4 - Uninsured Data
chondrite data
stamp thickness data
singer height data
melanoma rate data
ethanol data
Mount St. Helens earthquake data
high-dimensional data
PRIM7 particle physics data
Income Change Choropleth Map (Detroit Free Press, Aug. 30, 2006)
Web links for history lecture.
If a student has a disability that will likely require some accomodation by
the instructor, the student must contact the instructor and document the
disability through the Disability Resource Center, preferably during the
first week of the course. Any requests for special considerations
relating to attendance, pedagogy, taking of examinations, etc. must be
discussed with and approved by the instructor. In cooperation with the
Disability Resource Center, course materials can be provided in alternative
formats - large print, audio, diskette or Braille.
The last day to add this class is September 18. Attending this class beyond that date without being officially registered will not be approved by the Dean's office.
Disclaimer: The instructor reserves the right to alter anything about this course, pretty much on whim (but he probably won't).