Nonparametrics for Sensory Science:
A More Informative Approach

by J. C. W. Rayner, D. J. Best, P. B. Brockhoff and G. D. Rayner.

Introduction

This website is an additional resource for users of the book Nonparametrics for Sensory Science: A More Informative Approach, recently released by Blackwell Publishing.

Specifically, this website will contain updated errata for the book as well as the latest versions of software written by the authors to implement their techniques. This software is illustrated by application to selected examples from the book.

This website, as well as the software and and information contained within, or referred to by this website, is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This website may be changed or updated at any time.

To run software used to perform statistical analyses from the book, scroll down to the third heading Examples below where detailed instructions are provided.

Please address any comments, errors or suggestions about this site to pbb@imm.dtu.dk


: A Powerful and Free Statistical Package

We have chosen to use R as the language/environment for developing our statistical software. It is straightforward to download and install R for a wide variety of computer systems. Alternatively, a remote instance of R called Rweb can be used that does not require R to be installed on your computer.

R is a very powerful system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. The R-Project homepage contains all the information you will need to download, setup, and use R with most PC's.

If you prefer, instead of setting up R on your computer you can go to the Rweb site and just type or paste your R commands in there. If you are trying to run one of the examples below, DON'T FORGET TO INCLUDE THE PROGRAM DEFINITION CODE AHEAD OF THE EXAMPLE CODE each time you run an analysis. Note that each time you press the "submit" button on the Rweb page to process some input, then on the computer running this webpage R is opened, run on this input, then R is closed - therefore any input or results will not be remembered after the "submit" button is pressed.

R is very similar to the S language/environment. Many statisticians will have heard of the value-added version of S sold by Insightful Corporation as S-PLUS (see the Insightful S-PLUS page for further information). Most programs in S-PLUS can be ported to R with only cosmetic changes if any. The R-Project homepage contains very detailed information about the differences between R and S-PLUS.


Examples

The following is a selection of examples from the book Nonparametrics for Sensory Science: A More Informative Approach where the results have been generated using programs developed in the R language/environment. Each example contains a link to both the R code used (generally entering the data and applying programs to it) as well as a separate link showing the results that should be generated if this code text is pasted into an R dialog window (along with the program definitions of course).

Each of these examples uses programs/scripts that R needs to know about before it can perform the analysis. Make sure the PROGRAM DEFINITIONS are pasted into your R session dialog window before attempting to run the examples below.

  • Instructions for Beginners

  • Chapter 1

  • Chapter 2

  • Chapter 3

  • Chapter 4

  • Chapter 5

  • Chapter 6

  • Chapter 7


    Comments and Corrections

    Note that for a few of the examples above the data x is transformed to x*10 or x+1 before analysis (eg examples 3.6, 4.6.2, 6.2 and 6.3). This is because the rank table program requires whole numbers (integers greater than zero) as input.

    Similarly, for example 4.2.1 the text explains that the ranks are inverted - that is, the lowest rank of 1 is given to the highest score and the highest rank of 4 is given to the lowest score. In addition, the input data x is fractional. To produce rank equivalent whole number data, the original data x is transformed to 10*(20-x) prior to analysis.

    In section 3.5 of the book where the ties in the Tomato example data are randomly assigned, in addition to the ties noted in the text (for consumers 4 and 12) there is also a tie for consumer 22 between the Florade and Momotaro varieties. For this consumer, the tie was eliminated by ranking Momotaro above Florade. Note in the output above for this example both the broken tie data as well as the original data are analysed to produce results that do not materially disagree.

    In section 6.2 of the book (table 6.5 dealing with wine example 6.2) the total number of degrees of freedom is mis-stated. As the output above for wine example 6.2 shows, because the relevant U matrix for this data is not of full rank, then 4 degrees of freedom are "lost" in the analysis. Refer to the discussion near the end of section 3.6 and the paper:

    Brockhoff, P. B., Best ,D. J. & Rayner, J. C. W. (2004). Partitioning Anderson's Statistic for Tied Data. Journal of Statistical Planning and Inference, No. 121, p93-111.

    In section 6.6 the book incorrectly states that "For the hot chips data S takes the value 14.9 on 8 degrees of freedom with a p-value 0.06...". As the example 6.7 output shows, the correct value for the CMH analysis Extended Stuart Test Statistic applied to this data is 12.18, which with 8 degrees of freedom corresponds to a p-value of 0.14. The conclusion in the text is largely unchanged though once again the Anderson statistic is more sensitive.

    In section 7.3, the Examination Mark dataset example, the statistic V3 is given as negative when it should be positive. In addition, the reader should understand that the p-value of 0.045 given for S4 is in fact correct (it has been obtained via parametric bootstrap) - this is because S4 only asymptotically has the χ4 distribution. For the small sample size in this example (n=20) the asymptotics are not yet reliable.

    About half way through section 7.5 (just after figure 7.2) two orthogonal polynomials are defined, the first should be g1(i) rather than gi(i).

    In the third paragraph of section 7.6 the text reads "To get a statistic with an approximate chi-squared distribution with m-q degrees of freedom we should use (XP2-Sum[Vr2,r=1,...,m-1]). Often Sum[Vr2,r=1,...,m-1] is negligible, but this needs checking for each data set." In these expressions the upper limit of the sums is meant to be q rather than m-1, so the expressions should be (XP2-Sum[Vr2,r=1,...,q]) and Sum[Vr2,r=1,...,q].

    In section 7.7 the correct value residual is R=4.48 (as shown in example 7.7) rather than 3.49. For this case, where the parameter estimate is essentially found by fitting location, we recommend "attributing" the zero df component to location (as in the second GOF analysis shown in example 7.7) which produces a p-value of 0.81141559 though the conclusion is essentially unchanged wherever this is attributed.

    In section 7.8 the XP2=6.859 value has p-value of 0.65 rather than 0.81 as in the text (see output for example 7.8). Also a few expected values in table 7.2 (bacterial cell counts) are incorrect. This table should be:

    ObservedExpected
    0 56 60.88
    1 104 90.76
    2 80 85.16
    3 62 64.12
    4 42 42.32
    5 27 25.58
    6 9 14.49
    7 9 7.85
    8 5 4.09
    9 3 2.07
    10 2 1.02
    11 1 1.66
    Total 400 400
    TABLE 7.2: COUNTS OF BACTERIAL CELLS PER MICROSCOPE FIELD IN A MILK FILM

    In section 7.9 the components Vr,s are defined incorrectly - the divisor should be sqrt(n) rather than n to give Vr,s=Sum[gr(yi,1)gs(yi,2)/sqrt(n),r=1,...,n]

    We welcome information about potential misprints etc, please email details of any such information to pbb@imm.dtu.dk


    Last updated: 5 April, 2005.