% scivdp.m generates and analyzes nearly collinear data. % The key concepts--singular value decomposition and variance decomposition % proportions--are explained in D. Belsley, E. Kuh, and R. Welsch, Regression % Diagnostics (1980) and D. Belsley, Conditioning Diagnostics (1991). Some % of the Matlab code used here is taken from bkw.m, written by J. LeSage. % Step 1: generate nxk design matrix X with two near dependencies n=100; k=5; randn('state',0) % Return random number generator to % original state to facilitate % replication. X=randn(n,k); X(:,1)=ones(n,1); X(:,3)=X(:,1)+X(:,2)+randn(n,1)*0.05; X(:,5)=X(:,4)+randn(n,1)*0.05; % Step 2: analyze design matrix X2 = X.^2; length = sum(X2).^.5; % Lengths of columns of design matrix lm = ones(n,1)*length; scaledX = X./lm; % Scaled design matrix has columns of unit length. [U,D,T] = svd(scaledX,0); % Singular value decomposition D % Print the diagonal matrix D. T % Print eigenvectors of X'X. mu = diag(D) % Put the singular values of X in vector. mumax = mu(1) mumaxvec = mumax*ones(k,1) etatilde = mumaxvec./mu % Scaled condition indexes of X. Each large condition % index corresponds to a near linear dependence. t2 = T'.*T' mu2 = mu.^2 % Eigenvalues of X'X. mu2mat = mu2*ones(1,k) tom = t2./mu2mat vif = sum(tom)' % Variance inflation factors. vifmat = ones(k,1)*vif' vdp = tom./vifmat % Variance decomposition proportions. sum(vdp) % Check that proportions sum to one. table = [etatilde vdp] % A large element in the first column indicates a near linear dependence. % Large elements to its right indicate which variables are involved in this % near dependence.