/* This Gauss program generates and analyzes nearly collinear data. ** The key concepts--singular value decomposition and variance decomposition ** proportions--are explained in D. Belsley, E. Kuh, and R. Welsch, Regression ** Diagnostics (1980) and D. Belsley, Conditioning Diagnostics (1991). */ /* Step 1: generate nxk design matrix X with two near dependencies */ n = 100; k = 5; rndseed 1; /* Reset generator's seed to original state ** to facilitate replication. */ x=rndn(n,k); x[.,1]=ones(n,1); x[.,3]=x[.,1]+x[.,2]+rndn(n,1)*0.05; x[.,5]=x[.,4]+rndn(n,1)*0.05; x2 = x.^2; length = sumc(x2)^.5; lm = ones(n,1)*length'; scaledx = x./lm; /* Step 2: analyze design matrix */ print on; {U,D,T} = svd2(scaledX); /* Singular value decomposition of the design matrix */ D; /* Print the diagonal matrix D */ T; /* Print eigenvectors of X'X */ mu = diag(D); /* Put the singular values of X in vector mu */ mumax = mu[1]; /* Assign maximum singular value to mumax */ mumaxvec = mumax*ones(k,1); eta = mumaxvec./mu; /* Condition indices of X. Each large condition index ** corresponds to a near linear dependence */ t2 = T'.*T'; mu2 = mu.^2; /* Eigenvalues of X'X */ mu2mat = mu2*ones(1,k); tom = t2./mu2mat; vif = sumc(tom); /* Variance inflation factors */ vifmat = ones(k,1)*vif'; vdp = tom./vifmat; /* Variance decomposition proportions */ sumc(vdp); /* Check that proportions sum to one */ table = eta~vdp; /* A large element in the first column indicates a near linear dependence. ** Large elements to its right indicate which variables are involved in ** this near dependence */