Word list
Finn Årup Nielsen
Neurobiology Research Unit,
Rigshospitalet
and
Abstract:
Word list with short explaination in the areas of neuroinformatics and
statistics.

 abundance matrix
 A data matrix
that
contains actual numbers of occurrences or proportions
[Kendall, 1971] according to
[Mardia et al., 1979, exercise 13.4.5].
 activation function
 The nonlinear
function in the output of the unit in a neural network. Can be a
threshold function, a piecewise linear function or a sigmoidal
function, e.g., hyperbolic tangent or logistic sigmoid.
If the activation function is on the output of the neural network it
can be regarded as a link function.
 active learning
 1: The same as focusing [MacKay, 1992b].
2: supervised learning [Haykin, 1994]
 adaptive principal components extraction
 APEX. Artificial neural
network structure with feedforward and lateral connections to
compute principal components [Kung and Diamantaras, 1990].
 adjacency matrix
 A binary and square matrix
describing the
connections in a graph consisting of nodes.
 Akaike's Information criteria
 AIC. Also called ``information
criteria A''
where is the number of parameters of the model.
 analysis of covariance
 A type of (univariate) analysis of
variance where some of the independent variables are
supplementary/noninteresting  usually confounds/nuisance
variable  and used to explain
variation in the dependent variable [Pearce, 1982].
 antiHebbian learning
 Modeling by employing a constraint.
 asymmetric divergence
 Equivalent to the relative entropy.
 author cocitation analysis
 Analysis of the data formed when a
(scientific) paper cites two different authors.
The performed with, e.g., cluster analysis
[McCain, 1990]. An overview of author cocitation analysis
is available in [Lunin and White, 1990].
 autoassociation
 Modeling with the input the same as the
output.
 backpropagation
 1: The method to find the
(firstorder) derivative of a multilayer neural network.
2: The method of adjusting (optimizing) the parameters in a multilayer
neural network.
 Bayes factor
 A ratio of evidences

(2) 
where
and
are hypotheses, models or
hyperparameters.
See, e.g., [Kass and Raftery, 1995] and attributed to Turing and
Jeffreys.
 Bayesian information criterion
 (BIC) Also called Schwarz
(Bayesian) (information) criterion (SBC)
 bias
 1: A threshold unit in a neural network.
2: A models inability to model the true system.
3: The difference between the mean estimated model and
the true model. 4: The difference between the mean of an estimator and
the true value.
 biasvariance tradeoff
 The compromise between the simplicity
of the model (which causes bias) and complexity (which causes
problems for estimation and results in variance) [Geman et al., 1992].
 binary matrix
 A matrix with elements as either one or zero.
Also called a (0,1)matrix.
 bootstrap
 A resampling scheme that samples with replacement in
the sample, and it is useful to assess the accurary of an
estimate. See, e.g., [Zoubir and Boashash, 1998].
 Burt matrix
 A product matrix of an indicator matrix
[Burt, 1950] [Jackson, 1991, p. 225]:
 canonical correlation analysis
 A type of multivariate analysis.
 canonical variable analysis
 A set of different multivariate
analyses.
 canonical variate analysis
 Canonical correlation
analysis for discrimination, i.e., with categorical variables.
 central moment
 k'thorder sample central moment

(5) 
 cluster
 1: In SPM99 a region in a thresholded SPM which voxels
are connected. 2: In cluster analysis a set of voxels (or other
objects) assigned together and associated with a ``center''.
 cluster analysis
 An unsupervised method to group data points.
A specific method is Kmeans.
 clustering
 1: The tendency of data points to be unequally
distributed in, e.g., space or time.
2: cluster analysis.
 coefficient of variance
 The standard deviation normalized by
the mean
coefficient of variance 
(6) 
Sometimes found with the abbreviations COV or CoV.
 cognition
 1: Any mental process.
2: A mental process that is not sensormotoric or emotional.
3: The process involved in knowing, or the act of
knowing (Encyclopaedia Britannica Online)
 complete likelihood
 (Completedata likelihood)
The joint propability density of observed and unobserved variables

(7) 
where is observed and is hidden
(data/parameters) and
is the parameters.
 conditional (differential) entropy


(8) 
 conditional probability density function


(9) 
Here the probability density for given y.
 confound
 (usual meaning) A nuisance variable that is correlated with the
variable of interest.
 conjugate prior
 (Also ``natural conjugate prior'') An
informative prior ``which have a functional form which integrates
naturally with data measurements, making the
inferences have an analytically convenient form''
[MacKay, 1995].
Used for regularization.
 consistent
 An estimator is consistent if the variance of the
estimate goes to zero as more data (objects) are gathered.
 contrast
 In (general) linear modeling: A vector for a linear
combination of parameters that sum to zero.
Vectors associated with estimable linear combinations of the
parameters that do not sum to zero are sometimes also called 
against definition 
`contrasts' [Nichols, 2002].
 cost function
 The function that is optimized.
Can be developed via maximum likelihood from a distribution
assumption between the target and the model output.
Other names are Lyapunov function (dynamical systems),
energy function (physics),
Hamiltonian (statistical mechanics),
objective function (optimization theory),
fitness function (evolutionary biology).
[Hertz et al., 1991, page 2122].
 crossentropy


(10) 
Equivalent to the sum of the relative entropy and
the entropy (of the distribution the relative entropy is measured
with respect to).
Can also be regarded as the average negative loglikelihood, e.g.,
with
as a modeled density and
as the true
unknown density [Bishop, 1995, pages 5859].
 crossvalidation
 Empirical method to assess model performance,
where the data set is split in two: One set that is used to estimate
the parameters of the model and the other set that is used to
 data matrix
 A matrix
in which number
of
dimensional multivariate measurements are represented, see,
e.g.,
[Mardia et al., 1979, sec. 1.3].
 dependence
 In the statistical sense: If two
random variables are
dependent then one of the random variables convey information about
the value of the other.
While correlation is only related to the probability density
function with the two first moments,
dependency is related to all the moments.
 dependent variable
 The variable to be explained/predicted from
the independent variable. The output of the model.
Often denoted .
 design matrix
 A matrix containing the ``independent'' variables
in a multivariate regression analysis.
 differential entropy
 Entropy for continuous distributions

(11) 
Often just called ``entropy''.
 directed divergence
 Equivalent to relative entropy according to
[Haykin, 1994].
 eigenimage
 An eigenvector associated with a principal component
from a principal component analysis that can be interpreted as an
image or volume.
 empirical Bayes
 Bayesian technique where the prior is specified
from the sample information
 entropy
 A measure for the information content or
degree of
disorder.
 elliptic distribution / elliptically contour distribution
 A
family of distribution with elliptical contours.

(12) 
The distribution contains the Gaussian distribution, the
multivariate distribution, the contaminated normal,
multivariate Cauchy and multivariate logistic distribution
 estimation
 The procedure to find the model or the parameters of
the model. In the case or parameters: a single value for each
parameter in, e.g., maximum likelihood estimation, or a distribution
of the parameters in Bayesian technique.
 evidence
 The probability of the data given the
model or hyperparameters [MacKay, 1992a].
``Likelihood for hyperparameters'' or
``likelihood for models'', e.g.,
.
Found by integrating the parameters of the model

(13) 
This is the same as ``integrated likelihood'' or ``marginal
likelihood''.
 expectation maximization
 A special group of optimization
methods for models with unobserved data [Dempster et al., 1977].
 explorative statistics
 Statistics with the aim of generating
hypotheses rather than testing hypothesis.
 feedforward neural network
 A nonlinear model.
 finite impulse response (model)
 A linear model with a finite
number of input lags and a single output.
 fixed effect model
 Model where the parameters are not random
[Conradsen, 1984, section 5.2.2]. See also random effects model and mixed effects model.
 Fmasking
 Reduction of the number of voxels analyzed using a
Ftest.
 Frobenius norm
 Scalar descriptor of a matrix
[Golub and Van Loan, 1996, equation 2.3.1].
The same as the square root of the sum of the singular values
[Golub and Van Loan, 1996, equation 2.5.7].

(14) 
 full width half maximum
 Used in specification of filter width,
related to the standard deviation for a Gaussian kernel by
FWHM 
(15) 
 functional integration
 The notion that brain region interact or
combine in solving a mental task.
Brain data in this connection is usually modeled with
multivariate statistical methods [Friston, 1997].
 functional segregation
 The issue in analyses of
brain data with univariate statistical methods [Friston, 1997].
 functional volumes modeling
 Metaanalysis in Talairach space,
 a term coined in [Fox et al., 1997].
 GaussNewton (method)
 Newtonlike optimization method that uses
the ``inner product'' Hessian.
 general linear hypothesis
 Hypotheses associated with the multivariate regression model of the
form [Mardia et al., 1979, section 6.3]

(16) 
In many case the the hypothesis is of a simple type
testing only for difference between the rows


(17) 
In the univariate case it becomes yet simpler


(18) 
 general linear model
 A type of multivariate regression
analysis, usually where the independent variables are a design
matrix.
 generalized additive models
 A group of nonlinear models
proposed by [Hastie and Tibshirani, 1990]:
Each input variable is put through a nonlinear function. All
transformed input variable is then used in a multivariate linear
regression, and usually with a logistic sigmoid function on the
output. In the notation of [Bishop, 1995]:

(19) 
 generalized inverse
 An ``inverse'' of a square or rectangular
matrix.
If denotes the generalized inverse of then it
will satisfy some of the following Penrose equations
(MoorePenrose conditions)
[Golub and Van Loan, 1996, section 5.5.4] and [BenIsrael and Greville, 1980]:
where denotes the conjugate, i.e., transposed matrix if
is real.
A matrix that satisfy all equations is uniquely determined and
called the MoorePenrose inverse and (often) denoted as
.
 generalized least squares
 1: Regression with correlated
(nonwhite) noise, thus were the noise covariance matrix is not
diagonal, see, e.g.,
[Mardia et al., 1979, section 6.6.2]. 2: The
GaussNewton optimization method.
 generalized linear model
 An ``almost'' linear model where the
output of the model has a link function for modeling
nonGaussian distributed data.
 GoodTuring frequency estimation
 A type of regularized frequency
estimation [Good, 1953].
Often used in word frequency analysis [Gale and Sampson, 1995].
 Gram matrix
 A Gram matrix is Hermitian and constructed from a data matrix
with dimensional vectors

(24) 
i.e., the elements of the Gram matrix
are the
dot products between all possible vectors
.
 gyrification index
 An index for the degree of folding of the cortex
GI 
(25) 
There are no folding for
GI and folding for
GI.
 hard assignment
 Assignment of a data point to one (and only
one) specific component, e.g., cluster in cluster analysis or
mixture component in mixture modeling.
 Hebbian learning
 Estimation in a model where the magnitude of a
parameter is determined on how much it is ``used''.
 hemodynamic response function
 The coupling between neural
and vascular activity.
 heteroassociation
 Modeling where the input and output are
different, e.g., ordinary regression analysis.
 homogeneity
 Stationarity under translation.
 hyperparameter
 A parameter in a model that is used in the
estimation of the model but has no influence on the response of the
estimated model if changed.
An example of a hyperparameter is weight decay
 identification
 Estimation
 illposed
 A problem is illposed if the singular values of the
associated matrix gradually decay to zero
[Hansen, 1996], cf. rank deficient
 Imax
 Algorithms maximizing the mutual information (between
outputs) [Becker and Hinton, 1992].
 incidence matrix
 A binary matrix with size (nodes links)
[Mardia et al., 1979, p. 383]:
.
Also called indicator matrix.
 incomplete likelihood
 A marginal density marginalized over hidden data/parameters.
 indicator matrix
 A data matrix with elements one or zero
[Jackson, 1991, p. 224]:
.
 information criterion
 Includes Akaike's information criteria (AIC), Bayesian information
criterion (BIC), (Spiegelhalter's) Bayesian deviance information
criterion (DIC), ...
 inion
 The external occipital protuberance of the skull
(Webster). Used as marker in EEG. Opposite the nasion.
See also: nasion, periauri
 integrated likelihood
 The same as the evidence.
 International Consortium for Brain Mapping
 (Abbreviation: ICBM)
Group of research institutes.
They have developed a widely used brain template know as the ICBM or
MNI template.
 inversion
 In the framework of inputsystemoutput: To find the
input from the system and the output.
 isotropy
 Stationarity under rotation.
 KarhunenLóeve transformation
 The same as principal
component analysis. The word is usually used in communication theory.
 kernel density estimation
 Also known as Parzen windows and
probabilistic neural network.
 Kmeans
 Cluster analysis algorithm with hard assignment.
 knearest neighbor
 A classification technique
 knowledge
 Information that has been placed in a context.
 KullbackLeibler distance
 Equivalent to relative
entropy.
 kurtosis
 Normalized fourth order central moment.
The univariate Fisher kurtosis is defined as

(28) 
where and are the fourth and second central moment,
respectively.
 latent class analysis
 Also called latent class decomposition.
A decomposition of a contingency table (multivariate categorical)
into latent classes, see, e.g., [Hofmann, 2000].
The observed variables are often called manifest variables.

(29) 
 latent semantic indexing
 Also ``latent semantic analysis''. Truncated singular value decomposition
applied on a bagofwords data matrix [Deerwester et al., 1990].
 lateral orthogonalization
 Update method (``antiHebbian
learning'') or connections between the units in the
same layer in a neural network
which impose orthogonality between the different units, i.e., a kind
of deflation technique.
Used for connectionistic variations of singular value decomposition,
principal component analysis and partial least squares among
others, see, e.g., [Hertz et al., 1991, pages 209210] and
[Diamantaras and Kung, 1996, section 6.4].
 learning
 In the framework of inputsystemoutput: To find the
system from (a set of) inputs and outputs.
In some uses the same as estimation and training.
 leaveoneout
 A crossvalidation scheme where each data point
in turn is kept in the validation/test set while the rest is used
for training the model parameters.
 likelihood
 A function where the data is fixed and the
parameters are allowed to vary

(30) 
 link function
 A (usually monotonic) function on the output of a
linear model that is used in generalized linear models to model
nonGaussian distributions.
 lix
 Number for the readability of a text [Björnsson, 1971].
 logistic (sigmoid) (function)
 A monotonic function used as a link function to convert a variable
in the range
to the range suitable for
interpretation as a probability

(31) 
 manifold
 A nonlinear subspace in a high dimensional space. A
hyperplane is an example on a linear manifold.
 MannWhitney test
 A nonparametric test for a translational
difference between to distributions by the application of a rank
transformation. Essentially the same as the Wilcoxon ranksum test,
though another statistics is used: The `MannWhitney statistics'
 marginal likelihood
 The same as the evidence and the
``integrated likelihood''.
 marked point process
 A point process where each point has an extra attribute apart from
its location, e.g., a value for its magnitude.
 Markov chain Monte Carlo
 Sampling
technique in simulation and Bayesian statistics.
 massunivariate statistics
 Univariate statistics when applied
to several variables.
 matching prior
 A prior in Bayesian statistics that is selected so posterior
credible sets resemble frequentist probabilities.
 maximal eigenvector
 The eigenvector associated with the largest
eigenvector.
 maximum a posteriori
 Maximum likelihood estimation ``with prior''.
 maximum likelihood
 A statistical estimation principal with
optimization of the likelihood function
 mestimation
 Robust statistics.
 MetropolisHasting algorithm
 Sampling technique for Markov
chain Monte Carlo. [Metropolis et al., 1953]
 mixed effects model
 Type of ANOVA that consists
of random and fixed effects
[Conradsen, 1984, section 5.3].
See also fixed effects model and random effects model.
 MoorePenrose inverse
 A generalized inverse that satisfy all
of the ``Penrose equations'' and is uniquely defined
[Penrose, 1955].
 multiple regression analysis
 A type of multivariate regression
analysis where there is only one response variable (
.
 multivariate analysis
 Statistics with more than one variable is
each data set,
as opposed to univariate statistics.
 multivariate analysis of variance
 (MANOVA) Type of analysis of
variance with multiple measures for each `object' or experimental
design.
Oneway MANOVA can be formulated as
[Mardia et al., 1979, section 12.2]

(32) 
where
is observations (`responses' or `outcomes'),
is the general effect,
are the
condition effects and
is the noise often assumed
to be independent Gaussian distributed.
The most common test considers the difference in condition effect

(33) 
 multivariate regression analysis
 A type of linear multivariate analysis using the following model

(34) 
is an observed matrix and is a known
matrix. is the parameters and is the noise
matrix.
The model is either called multivariate regression model (if is observed) or general
linear model (if is ``designed'').
[Mardia et al., 1979, chapter 6]
 multivariate regression model
 A type of multivariate
regression analysis where the known matrix () is observed.
 mutual information
 Originally called
``information rate'' [Shannon, 1948].
 nasion
 The bridge of the nose. Opposite the inion.
See also: inion, and periauricular points.
Used as reference point in EEG.
 neural network
 A model inspired by biological
neural networks.
 neurological convention
 Used in connection with transversal or
axial images of the brain to denote the left side of the image
is to the left side of the brain,  as opposed to the ``radiological
convention''.
 noninformative prior
 A prior with little effect on the
posterior, e.g., an uniform prior or Jeffreys' prior.
Often improper, i.e., not normalizable.
 nonparametric (model/modeling)
 1: A nonparametric model is a
model where the parameters do not have a direct physical meaning.
2: A model with no direct parameters. [Rasmussen and Ghahramani, 2001]:
[...] models which we do not necessarily know the roles played by
individual parameters, and inference is not primarily targeted at
the parameters themselves, but rather at the predictions made by
the models
 novelty
 ``Outlierness''. How ``surprising'' an object is.
 nuisance
 a variable of no interest in the modeling that
makes the estimation of the variable of interest more difficult.
See also confound.
 object
 A single ``data point'' or ``example''.
A single instance of an observation of one or more variables.
 optimization
 Used to find the point estimate of a
parameter. ``Optimization'' is usually used when the estimation requires
iterative parameter estimation, e.g., in connection with nonlinear
models.
 ordination
 The same as multidimensional scaling.
 orthogonal
 For matrices: Unitary matrix with no
correlation among the (either column or row) vectors.
 orthonormal
 For matrices: Unitary matrix with no correlation
among the (either column or row) vectors.
 pairwise interaction point process
 A spatial point process that is a specialization of the Gibbsian
point process [Ripley, 1988, page 50]
 partial least squares (regression)
 Multivariate analysis
technique usually with multiple response variables
[Wold, 1975]. Much used in chemometrics.
 Parzen window
 A type of probability density function model
[Parzen, 1962] where a window (a kernel) is placed at
every object. The name is used in the pattern recognition
literature and more commonly known as kernel density estimation.
 penalized discriminant analysis
 (Linear) discriminant analysis
with regularization.
 perceptron
 A (multilayer) feedforward neural network
[Rosenblatt, 1962].
 periauricular
 See also: nasion, inion
 periodogram
 The power spectrum of a finite signal where
is the complex frequency spectrum and
The term is also use for the power spectrum of a
rectangularwindowed signal.
 polysemy
 The notion of a single word having several meanings.
The opposite of synonymy.
 preliminary principal component analysis
 Principal component
analysis made prior to a supervised modeling, e.g., an artificial
neural network analysis or canonical variate analysis.
 prediction
 In the framework inputsystemoutput: To find the
output from the input and system.
 principal component analysis
 (PCA) An unsupervised multivariate
analysis that identifies an orthogonal transformation and remaps
objects to a new subspace with the transformation
[Pearson, 1901], [Mardia et al., 1979, page 217].

(43) 
An individual elemenent in is called a score. The 'th
column in is called the 'th principal component.
 principal component regression
 Regularized regression by
principal component analysis [Massy, 1965].
 principal coordinate analysis
 A method in multidimensional
scaling. Similar to principal component analysis on a distance
matrix if the distance measure is Euclidean
[Mardia et al., 1979, section 14.3].
 principal covariate regression
 Multivariate analysis technique
[de Jong and Kiers, 1992].
 principal manifold
 Generalization of principal
curves, a nonlinear version of principal component analysis.
[DeMers and Cottrell, 1993]
 prior
 Distribution of parameters before observations have been made. Types
of priors are: uniform, noninformative, Jeffrey's, reference, matching,
informative, (natural) conjugate.
 probabilistic neural network
 A term used to denote kernel density
estimation [Specht, 1990].
 probabilistic principal component analysis
 Principal component
analysis with modeling of a isotropic noise
[Tipping and Bishop, 1997]. The same as sensible principal
component analysis.
 profile likelihood
 A likelihood where some of the variables  e.g., nuisance variables
 are maximized [Berger et al., 1999, equation 2]

(44) 
 radiological convention
 Used in connection with transversal or
axial images of the brain to denote the right side of the image
is to the left side of the brain,  as opposed to the ``neurological
convention''.
 random effects model
 Type of ANOVA where the parameters (on
the ``first'' level'') are regarded as random
[Conradsen, 1984, section 5.2.3]. See also fixed
effects model and mixed effects model.
 rank
 The size of a subspace span by the vectors in a matrix
 rank deficient
 A matrix is said to be rank deficient if there
is a welldefined gap between large and small singular values
[Hansen, 1996], cf. illposed.
 regularization
 The method of stabilizing the model estimation.
 relative entropy
 A distance measure between two distributions
[Kullback and Leibler, 1951].

(45) 
Other names are crossentropy,
KullbackLeibler distance (or information criterion), asymetric
divergence, directed divergence.
 responsibility
 In cluster analysis and mixture modeling the
weight of assignment to a particular cluster for a specific data
point .
With probabilistic modelling this can be interpreted as a posterior
probability

(46) 
 restricted canonical correlation analysis
 (RCC) Canonical
correlation analysis with restriction on the parameters, e.g.,
nonnegativity [Das and Sen, 1994,Das and Sen, 1996].
 robust statistics
 Statistics designed to cope with outliers.
 run
 A part of an experiment consisting of several measurements,
e.g., several scans. The measurements in a run is typically done
with a fixed frequency. Multiple runs can be part of a session and a
run might consist of one or more trials or events.
 saliency map
 A map of which inputs are important for predicting
the output.
 saturation recovery
 Type of MR pulse sequence.
 selforganizing learning
 The same as unsupervised learning
 sensible principal component analysis
 Principal component
analysis with modeling of an isotropic noise
[Roweis, 1998]. The same as probabilistic
principal component analysis [Tipping and Bishop, 1997]
 session
 A part of the experiment: An experiment might consists
of multiple sessions with multiple subjects and every session
contain one or more runs.
 sigmoidal
 Sshaped. Often used in connection about the
nonlinear function in an artificial neural network.
 skewness
 The left/right asymmetry of a distribution.
The usual definition is

(47) 
where and are the third and second central moment.
Multivariate skewness can be defined as
[Mardia et al., 1979, p. 21, 148+]
 slice timing correction
 In fMRI: Correction for the difference
in sampling between slices in a scan.
Slices can acquired interleaved/noninterleaved and
ascending/descending.
 softmax
 A vector function that is a generalization of the
logistic sigmoid activation function suitable to transform the
variables in a vector from the interval
to so they can be used as probabilities [Bridle, 1990]

(50) 
 sparseness
 The sparseness of the matrix is the number of
nonzero elements to the total number of elements.
Another definition is a function of the and norm
[Hoyer, 2004, p. 1460]

(51) 
 spatial independent component analysis
 Type of independent
component analysis in functional neuroimaging
[Petersson et al., 1999]. See also temporal component analysis.
 statistic
 A value extracted from a data set, such as the
empirical mean or the empirical variance.
 statistical parametric images
 A term used by Peter T. Fox et
al. to denote the images that are formed by statistical
analysis of functional neuroimages.
 statistical parametric mapping
 The process of getting
statistical parametric maps: Sometimes just denoting voxelwise
ttests, other times ANCOVA GLM modeling with random fields modeling,
and sometimes also including the preprocessing: realignment, spatial
normalization, filtering, ...
 statistical parametric maps
 A term used by Karl J. Friston and
others to denote the images that are formed by statistical
analysis of functional neuroimages, especially those formed from the
program SPM.
 subgaussian
 A distribution with negative kurtosis, e.g., the
uniform distribution. Also called
`platykurtic'.
 sufficient
 A statistics (function)
is
sufficient if it is ``enough'' to estimate the parameters of the
model ``well'',  or more specifically: Enough to described the
likelihood function within a scaling factor.
The likelihood can then be factorized:

(52) 
 supergaussian
 A distribution with positive kurtosis, i.e., a
heavytailed distribution such as the Laplace distribution.
Also called `leptokurtic'.
 supervised (learning/pattern recognition)
 Estimation of a model
to estimate a ``target'', e.g., classification (the target is the
class label) or regression (the target is the dependent variable)
estimation with labeled data.
 synonymy
 The notion that several words have the same meaning.
The opposite of polysemy.
 system
 The part of the physical world under
investigation. Interacts with
the environment through input and output.
 system identification
 In the framework of inputsystemoutput:
To find the system from (a set of) inputs and outputs.
The same as learning, although system identification
usually refers to parametric learning.
 temporal independent component analysis
 Type of independent
component in functional neuroimaging
[Petersson et al., 1999].
See also spatial independent component analysis
 test set
 Part of a data set used to test the performance (fit)
of a model. If the estimate should be unbiased the test set should
be independent of the training and validation set.
 timeactivity curve
 The curve generated in connection with
dynamic positron emission tomography (PET) images
 total least square
 Multivariate analysis estimation technique
 training
 Term used in connection with neural networks to denote
parameter estimation (parameter optimization). Sometimes called
learning.
 training set
 Part of a data set used to fit the parameters of the
model (not the hyperparameters). See also test and validation set.
 trial
 An element of a psychological experiment usually
consisting of a stimulus and a response.
 truncated singular valued decomposition
 Singular value
decomposition of a matrix
where only a number
of components, say
rank, are maintained

(53) 
where the diagonal is
.
The truncated SVD matrix is the ranked matrix with minimum
2norm and Frobenius norm of the difference between all ranked
matrices and .
 univariate statistics
 Statistics with only one response variable, as
opposed to multivariate analysis.
 unsupervised learning
 Learning with only one set of data, 
there is no target involved.
Cluster analysis and principal component analysis is usually regarded as
unsupervised.
 variational Bayes
 An extension of the EM algorithm were both the hidden variables and
the parameters are associated with probability density functions
(likelihood and posteriors).
With observed , hidden and parameters


(54) 
...
 voxel
 A 3dimensional pixel. The smallest picture element in a
volumetric image.
 validation set
 Part of a data set used to tune
hyperparameters.
 vector space model
 In information retrieval: Representation of
a document as a vector where each element is (usually) associated
with a word [Salton et al., 1975].
The same as the bagofwords representation.
 weights
 E.g., the model parameters of a neural network.
 white noise
 Noise that is independent (in the time dimension).
 zscore
 Also called ``standard score'' and denotes a random
variable transformed so the mean is zero and the standard
deviation is one. For a normal distributed random variable the
transformation is:

(55) 
 Becker and Hinton, 1992

Becker, S. and Hinton, G. E. (1992).
A selforganizing neural network that discovers surfaces in
randomdot stereograms.
Nature, 355(6356):161163.
 BenIsrael and Greville, 1980

BenIsrael, A. and Greville, T. N. E. (1980).
Generalized Inverses: Theory and Applications.
Robert E. Krieger Publishing Company, Huntington, New York, reprint
edition edition.
 Berger et al., 1999

Berger, J. O., Liseo, B., and Wolpert, R. L. (1999).
Integrated likelihood methods for eliminating nuisance parameters
(with discussion).
Statistical Science, 14:128. http://ftp.isds.duke.edu/WorkingPapers/9701.ps. CiteSeer:
http://citeseer.ist.psu.edu/berger99integrated.html.
 Bishop, 1995

Bishop, C. M. (1995).
Neural Networks for Pattern Recognition.
Oxford University Press, Oxford, UK. ISBN 0198538642 [
bibliotek.dk
 isbn.nu ]
.
 Björnsson, 1971

Björnsson, C. H. (1971).
Læsbarhed.
GEC, Copenhagen.
 Bridle, 1990

Bridle, J. S. (1990).
Probalistic interpretation of feedforward classification network
outputs, with relationships to statistical pattern recognition.
In Fogelman Soulié, F. and Hérault, J., editors, Neurocomputing: Algorithms, Architectures and Application, pages 227236.
SpringerVerlag, New York.
 Burt, 1950

Burt, C. (1950).
The factorial analysis of qualitative data.
British Journal of Statistical Psychology (Stat. Sec.),
3:166185.
 Conradsen, 1984

Conradsen, K. (1984).
En Introduktion til Statistik.
IMSOR, DTH, Lyngby, Denmark, 4. edition.
In Danish.
 Das and Sen, 1994

Das, S. and Sen, P. K. (1994).
Restricted canonical correlations.
Linear Algebra and its Applications, 210:2947.
http://www.sciencedirect.com/science/article/B6V0R463GRJ43/2/2570d4ea8024361049d6def4f4bb9716.
 Das and Sen, 1996

Das, S. and Sen, P. K. (1996).
Asymptotic distribution of restricted canonical correlations and
relevant resampling methods.
Journal of Multivariate Analysis, 56(1):119.
DOI: 10.1006/jmva.1996.0001. http://www.sciencedirect.com/science/article/B6WK945NJFKB14/2/44c1e203190336d7ebbb18bc09cda8b7.
 de Jong and Kiers, 1992

de Jong, S. and Kiers, H. A. L. (1992).
Principal covariates regression.
Chemometrics and Intelligent Laboratory Systems, 14:155164.
 Deerwester et al., 1990

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R.
(1990).
Indexing by latent semantic analysis.
Journal of the American Society for Information Science,
41(6):391407. http://www.si.umich.edu/~furnas/POSTSCRIPTS/LSI.JASIS.paper.ps.
CiteSeer:
http://citeseer.ist.psu.edu/deerwester90indexing.html.
 DeMers and Cottrell, 1993

DeMers, D. and Cottrell, G. W. (1993).
Nonlinear dimensionality reduction.
In Hanson, S. J., Cowan, J. D., and Lee Giles, C., editors, Advances in Neural Information Processing Systems: Proceedings of the 1992
Conference, pages 580587, San Mateo, CA. Morgan Kaufmann Publishers.
NIPS5.
 Dempster et al., 1977

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).
Maximum likelihood from incomplete data via the EM algorithm.
Journal of the Royal Statistical Society, Series B, 39:138.
 Diamantaras and Kung, 1996

Diamantaras, K. I. and Kung, S.Y. (1996).
Principal Component Neural Networks: Theory and Applications.
Wiley Series on Adaptive and Learning Systems for Signal Processing,
Communications, and Control. Wiley, New York.
ISBN 0471054364 [
bibliotek.dk
 isbn.nu ]
.
 Fox et al., 1997

Fox, P. T., Lancaster, J. L., Parsons, L. M., Xiong, J.H., and Zamarripa, F.
(1997).
Functional volumes modeling: Theory and preliminary assessment.
Human Brain Mapping, 5(4):306311. http://www3.interscience.wiley.com/cgibin/abstract/56435/START.
 Friston, 1997

Friston, K. J. (1997).
Basic concepts and overview.
In SPMcourse, Short course notes, chapter 1. Institute of
Neurology, Wellcome Department of Cognitive Neurology. http://www.fil.ion.ucl.ac.uk/spm/course/notes.html.
 Gale and Sampson, 1995

Gale, W. A. and Sampson, G. (1995).
GoodTuring frequency estimation without tears.
Journal of Quantitative Linguistics, 2:217237.
 Geman et al., 1992

Geman, S., Bienenstock, E., and Doursat, R. (1992).
Neural networks and the bias/variance dilemma.
Neural Computation, 4(1):158.
 Golub and Van Loan, 1996

Golub, G. H. and Van Loan, C. F. (1996).
Matrix Computation.
John Hopkins Studies in the Mathematical Sciences. Johns Hopkins
University Press, Baltimore, Maryland, third edition.
ISBN 0801854148 [
bibliotek.dk
 isbn.nu ]
.
 Good, 1953

Good, I. J. (1953).
The population frequencies of species and the estimation of
population parameters.
Biometrika, 40(3 and 4):237264.
 Hansen, 1996

Hansen, P. C. (1996).
RankDeficient and Discrete IllPosed Problems.
Polyteknisk Forlag, Lyngby, Denmark. ISBN 8750207849 [
bibliotek.dk
 isbn.nu ]
.
Doctoral Dissertation.
 Hastie and Tibshirani, 1990

Hastie, T. J. and Tibshirani, R. J. (1990).
Generalized Additive Models.
Chapman & Hall, London.
 Haykin, 1994

Haykin, S. (1994).
Neural Networks.
Macmillan College Publishing Company, New York.
ISBN 0023527617 [
bibliotek.dk
 isbn.nu ]
.
 Hertz et al., 1991

Hertz, J., Krogh, A., and Palmer, R. G. (1991).
Introduction to the Theory of Neural Computation.
AddisonWesley, Redwood City, Califonia, 1st edition.
Santa Fe Institute.
 Hofmann, 2000

Hofmann, T. (2000).
Learning the similarity of documents: An information geometric
approach to document retrieval and categorization.
In Solla, S. A., Leen, T. K., and Müller, K.R., editors, Advances in Neural Information Processing Systems 12, pages 914920,
Cambridge, Massachusetts. MIT Press. ISSN 10495258 [ bibliotek.dk ] . ISBN 0262194503 [
bibliotek.dk
 isbn.nu ]
.
 Hoyer, 2004

Hoyer, P. O. (2004).
Nonnegative matrix factorization with sparseness constraints.
Journal of Machine Learning Research, 5:14571469.
http://www.jmlr.org/papers/volume5/hoyer04a/hoyer04a.pdf.
 Jackson, 1991

Jackson, J. E. (1991).
A user's guide to principal components.
Wiley Series in Probability and Mathematical Statistics: Applied
Probability and Statistics. John Wiley & Sons, New York.
ISBN 0471622672 [
bibliotek.dk
 isbn.nu ]
.
 Kass and Raftery, 1995

Kass, R. E. and Raftery, A. E. (1995).
Bayes factors.
Journal of the American Statistical Association,
90(430):773795. A review about Bayes factors.
 Kendall, 1971

Kendall, D. G. (1971).
Seriation from abundance matrices.
In Hodson, F. R., Kendall, D. G., and Tantu, P., editors, Mathematics in the Archeological and Historical Sciences, pages 215251.
Edinburgh University Press.
 Kullback and Leibler, 1951

Kullback, S. and Leibler, R. A. (1951).
On information and sufficiency.
Annals of Mathematical Statistics, 22:7986.
 Kung and Diamantaras, 1990

Kung, S. Y. and Diamantaras, K. I. (1990).
A neural network learning algorithm for adaptive principal component
extraction (APEX).
In International Conference on Acoustics, Speech, and Signal
Processing, pages 861864. Albuquerque, NM.
 Lunin and White, 1990

Lunin, L. F. and White, H. D. (1990).
Author cocitation analysis, introduction.
Journal of the American Society for Information Science,
41(6):429432.
 MacKay, 1992a

MacKay, D. J. C. (1992a).
Bayesian interpolation.
Neural Computation, 4(3):415447.
 MacKay, 1992b

MacKay, D. J. C. (1992b).
Informationbased objective functions for active data selection.
Neural Computation, 4(4):590604. ftp://wol.ra.phy.cam.ac.uk/pub/www/mackay/selection.nc.ps.gz. CiteSeer:
http://citeseer.ist.psu.edu/47461.html.
 MacKay, 1995

MacKay, D. J. C. (1995).
Developments in probabilistic modelling with neural
networksensemble learning.
In Kappen, B. and Gielen, S., editors, Neural Networks:
Artificial Intelligence and Industrial Applications. Proceedings of the 3rd
Annual Symposium on Neural Networks, Nijmegen, Netherlands, 1415 September
1995, pages 191198, Berlin. Springer.
 Mardia et al., 1979

Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979).
Multivariate Analysis.
Probability and Mathematical Statistics. Academic Press, London.
ISBN 0124712525 [
bibliotek.dk
 isbn.nu ]
.
 Massy, 1965

Massy, W. F. (1965).
Principal component analysis in exploratory data research.
Journal of the American Statistical Association, 60:234256.
 McCain, 1990

McCain, K. W. (1990).
Mapping authors in intellectual space: A technical overview.
Journal of the American Society for Information Science,
41(6):433443.
 Metropolis et al., 1953

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and
Teller, E. (1953).
Equations of state calculations by fast computing machine.
Journal of Chemical Physics, 21:10871091(1092?).
 Nichols, 2002

Nichols, T. E. (2002).
Visualizing variance with percent change threshold.
Technical note, Department of Biostatistics, University of Michigan,
Ann Arbor, Michigan, USA. http://www.sph.umich.edu/fnistat/PCT/PCT.pdf.
 Parzen, 1962

Parzen, E. (1962).
On the estimation of a probability density function and mode.
Annals of Mathematical Statistics, 33:10651076.
 Pearce, 1982

Pearce, S. C. (1982).
Analysis of covariance.
In Kotz, S., Johnson, N. L., and Read, C. B., editors, Encyclopedia of Statistical Science, volume 1, pages 6169. John Wiley &
Sons. ISBN 0471055468 [
bibliotek.dk
 isbn.nu ]
.
 Pearson, 1901

Pearson, K. (1901).
On lines and planes of closest fit to systems of points in space.
The London, Edinburgh and Dublin Philosophical Magazine and
Journal of Science, 2:559572.
 Penrose, 1955

Penrose, R. (1955).
A generalized inverse for matrices.
Proc. Cambridge Philos. Soc., 51:406413.
 Petersson et al., 1999

Petersson, K. M., Nichols, T. E., Poline, J.B., and Holmes, A. P. (1999).
Statistical limitations in functional neuroimaging. i.
noninferential methods and statistical models.
Philosophical Transactions of the Royal Society  Series B 
Biological Sciences, 354(1387):12391260.
 Rasmussen and Ghahramani, 2001

Rasmussen, C. E. and Ghahramani, Z. (2001).
Occam's razor.
In Leen, T. K., Dietterich, T. G., and Tresp, V., editors, Advances in Neural Information Processing Systems, Boston, MA. MIT Press.
http://nips.djvuzone.org/djvu/nips13/RasmussenGhahramani.djvu.
NIPS13.
 Ripley, 1988

Ripley, B. D. (1988).
Statistical inference for spatial process.
Cambridge University Press, Cambridge, UK.
ISBN 0521352347 [
bibliotek.dk
 isbn.nu ]
.
 Rosenblatt, 1962

Rosenblatt, F. (1962).
Principals of Neurodynamics.
Spartan, New York.
 Roweis, 1998

Roweis, S. (1998).
EM algorithms for PCA and SPCA.
In Jordan, M. I., Kearns, M. J., and Solla, S. A., editors, Advances in Neural Information Processing Systems 10: Proceedings of the 1997
Conference. MIT Press. http://www.gatsby.ucl.ac.uk/~roweis/papers/empca.ps.gz. ISBN 0262100762 [
bibliotek.dk
 isbn.nu ]
.
 Salton et al., 1975

Salton, G., Wong, A., and Yang, C. S. (1975).
A vector space model for automatic indexing.
Communication of the ACM, 18:613620.
 Shannon, 1948

Shannon, C. E. (1948).
A mathematical theory of communication.
Bell System Technical Journal, 27:379423, 623656.
 Specht, 1990

Specht, D. F. (1990).
Probabilistic neural networks.
Neural Networks, 3(1):109118.
 Tipping and Bishop, 1997

Tipping, M. E. and Bishop, C. M. (1997).
Probabilistic principal component analysis.
Technical Report NCRG/97/010, Neural Computing Research Group, Aston
University, Aston St, Birmingham, B4 7ET, UK.
 Wold, 1975

Wold, H. (1975).
Soft modeling by latent variables, the nonlinear iterative partial
least squares approach.
In Cani, J., editor, Perspectives in Probability and Statistics,
Papers in Honour of M. S. Bartlett. Academic Press.
 Zoubir and Boashash, 1998

Zoubir, A. M. and Boashash, B. (1998).
The bootstrap and its application in signal processing.
IEEE Signal Processing Magazine, pages 5676. An article
with an introduction to bootstrap that rather closely follows the Efron and
Tibshirani Bootstrap book.
Finn Årup Nielsen
20100423