= {\displaystyle \mathbf {Ax} } T ( Then the eigenvalues of Hare all either 0 or 1. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 11, Slide 22 Residuals • The residuals, like the fitted values of \hat{Y_i} can be expressed as linear {\displaystyle (\mathbf {H} )} H − {\displaystyle \mathbf {A} } [5][6] In the language of linear algebra, the projection matrix is the orthogonal projection onto the column space of the design matrix It describe T Many types of models and techniques are subject to this formulation. A symmetric idempotent matrix such as H is called a perpendicular projection matrix. X A The projection matrix has a number of useful algebraic properties. [ The matrix M is symmetric (M0 ¼ M) and idempotent (M2 ¼ M). , and is one where we can draw a line orthogonal to the column space of . 1 observations which have a large effect on the results of a regression. x is just is equal to the covariance between the jth response value and the ith fitted value, divided by the variance of the former: Therefore, the covariance matrix of the residuals is a large sparse matrix of the dummy variables for the fixed effect terms. [4](Note that tion of the observed values yj. onto the column space of I His called the hat matrix and is central in regression analysis. First, we simplify the matrices: H I and again it may be seen that A private seller is any person who is not a dealer who sells or offers to sell a used motor vehicle to a consumer. P {\displaystyle \mathbf {P} ^{2}=\mathbf {P} } The present article derives and discusses the hat matrix and gives an example to illustrate its usefulness. The minimum value of hii is 1/ n for a model with a constant term. These properties of the hat matrix are of importance in, for example, assessing the amount of leverage or in uence that y j has on ^y i, which is related to the (i;j)-th entry of the hat matrix. Section 2 defines the hat matrix and derives its basic properties. X { {\displaystyle \mathbf {r} } A {\displaystyle \mathbf {X} } 1 GDF is thus defined to be the sum of the sensitivity of each fitted value, Y_hat i, to perturbations in its corresponding output, Y i. {\displaystyle \mathbf {Ax} } Prove that if A is idempotent, then det(A) is equal to either 0 or 1. P {\displaystyle \mathbf {P} } y The projection matrix corresponding to a linear model is symmetric and idempotent, that is, H A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so, Therefore, since ( A few examples are linear least squares, smoothing splines, regression splines, local regression, kernel regression, and linear filtering. q beta hat is a scalar, k transpose y is a scalar. {\displaystyle \mathbf {y} } b {\displaystyle \mathbf {y} } X , which is the number of independent parameters of the linear model. {\displaystyle M\{A\}=I-P\{A\}} {\displaystyle A} For the case of linear models with independent and identically distributed errors in which The n×1 vector of ordinary predicted values of the response variable is yˆ = Hy, where the n×n prediction or Hat matrix, H, is given by (1.4) H = X(X′X)−1X′. {\displaystyle X=[A~~~B]} A Useful Multivariate Theorem (Similarly, the effective degrees of freedom of a spline model is estimated by the trace of the projection matrix, S: Y_hat = SY.) The hat matrix is a matrix used in regression analysis and analysis of variance.It is defined as the matrix that converts values from the observed variable into estimations obtained with the least squares method. {\displaystyle (\mathbf {P} )} X (The term "hat ma-trix" is due to John W. Tukey, who introduced us to the technique about ten years ago.) } X However, the points farther away at the extreme of … 2 The least-squares estimate, β ^ = ( X T X) − 1 X T y. Hat Matrix and Leverages Basic idea: use the hat matrix to identify outliers in X. PRACTICE PROBLEMS (solutions provided below) (1) Let A be an n × n matrix. = and the vector of fitted values by Define the hat or projection operator as Theorem: (Solution) Let A 2 IRm£n; B 2 IRm and suppose that AA+b = b. 2 In particular, U is a set of eigenvectors for XXT, and V is a set of eigenvectors for XTX.The non-zero singular values of X are the square roots of the eigenvalues of both XXT and XTX. . Let A be a symmetric and idempotent n × n matrix. . I σ ^ is an unbiased estimator of ~ . is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. ( {\displaystyle \mathbf {P} } y I Properties of leverages h ii: 1 0 h ii 1 (can you show this? ) , the projection matrix can be used to define the effective degrees of freedom of the model. can be decomposed by columns as X I {\displaystyle X} If the vector of response values is denoted by A [8] For other models such as LOESS that are still linear in the observations As you can see, the two x values furthest away from the mean have the largest leverages (0.176 and 0.163), while the x value closest to the mean has a smaller leverage (0.048). {\displaystyle \mathbf {\Sigma } } The hat matrix is calculated as: H = X (X T X) − 1 X T. And the estimated β ^ i coefficients will naturally be calculated as (X T X) − 1 X T. Each point of the data set tries to pull the ordinary least squares (OLS) line towards itself. These estimates are normal if Y is normal. T Section 3 formally examines two 2 P n i=1 h ii= p)h = P n i=1 hii n = p (show it). 3 (c) From the lecture notes, recall the de nition of A= Q. T. W. T , where Ais an (n n) orthogonal matrix (i.e. Estimated Covariance Matrix of b This matrix b is a linear combination of the elements of Y. T A ;the n nprojection/Hat matrix under the null hypothesis. ( For every n×n matrix A, the determinant of A equals the product of its eigenvalues. 1. MA 575: Linear Models span the row space of X. P Then since. {\displaystyle \mathbf {\hat {y}} } X . The leverage of observation i is the value of the i th diagonal term, hii , of the hat matrix, H, where. − { Three of the data points — the smallest x value, an x value near the mean, and the largest x value — are labeled with their corresponding leverages. { Show that H1=1 for the multiple linear regression case (p-1>1). (A+B)T=AT+BT, the transpose of a sum is the sum of transposes. is on the column space of X 1 X A {\displaystyle \mathbf {A} } P Properties of ^ Theorem 4.2. locally weighted scatterplot smoothing (LOESS), "Data Assimilation: Observation influence diagnostic of a data assimilation system", "Proof that trace of 'hat' matrix in linear regression is rank of X", Fundamental (linear differential equation), https://en.wikipedia.org/w/index.php?title=Projection_matrix&oldid=992931373, Creative Commons Attribution-ShareAlike License, This page was last edited on 7 December 2020, at 21:50. A denoted X, with X as above. E( ^) = E((X0X) 1X0Y) = (X0X) 1X0E(Y) = (X0X) 1X0X ~ = I n = ~ 2. , is Σ ) The residual vector is given by e = (In−H)y with the variance-covariance matrix V = (In−H)σ2, where Inis the identity matrix of order n. (   T X A = x } − , by error propagation, equals, where ≡ These estimates will be approximately normal in general. There are a number of applications of such a decomposition. {\displaystyle M\{X\}=I-P\{X\}} 1 {\displaystyle \mathbf {x} } P x ^ It follows that the hat matrix His symmetric too. 2. In this case, the matrix … Then any vector of the form x = A+b+(I ¡A+A)y where y 2 IRn is arbitrary (4) is a solution of Ax = b: (5) The formula for the vector of residuals So λ 2 = λ and hence λ ∈ { 0, 1 }. T {\displaystyle P\{X\}=X\left(X^{\mathsf {T}}X\right)^{-1}X^{\mathsf {T}}} In statistics, the projection matrix [3][4] The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. ^ has a multivariate normal distribution. − . (2) Let A be an n×n matrix. Additional information of the samples is available in the form of Y (also as above). call this matrix , the "hat matrix", because it "puts the hat on" . For example, if there are large blocks of zeros in a matrix, or blocks that look like an identity matrix, it can be useful to partition the matrix accordingly. A 1 ) onto If you bought your used car from a private seller, and you discover that it has a defect that impairs the safety or substantially impairs the use, you may rescind the sale within 30 days of purchase, if you can prove that the seller knew about the defect but didn’t disclose it. P Trace of a matrix is equal to the sum of its characteristic values, thus tr(P) = … In the classical application In statistics, the projection matrix ( P ) {\displaystyle (\mathbf {P} )} , sometimes also called the influence matrix or hat matrix ( H ) {\displaystyle (\mathbf {H} )} , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). Recall that M = I − P where P is the projection onto linear space spanned by columns of matrix X. However, this is not always the case; in locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neither symmetric nor idempotent. Now we know that the covariance just factors out as twice the covariance, because in these cases, there's scalars. is sometimes referred to as the residual maker matrix. M ". , which might be too large to fit into computer memory. (H is hat matrix, i.e., H=X (X'X)^-1X') The followings are my reasoning so far. Since our model will usually contain a constant term, one of the columns in the X matrix will contain only ones. Hat Matrix Y^ = Xb Y^ = X(X0X)−1X0Y Y^ = HY where H= X(X0X)−1X0. b Then, we can take the first derivative of this object function in matrix form. Practical applications of the projection matrix in regression analysis include leverage and Cook's distance, which are concerned with identifying influential observations, i.e. {\displaystyle H^{2}=H\cdot H=H} , though now it is no longer symmetric. Recall that H = [h ij]n i;j=1 and h ii = X i(X T X) 1XT i. I The diagonal elements h iiare calledleverages. ,[1] sometimes also called the influence matrix[2] or hat matrix where p is the number of coefficients in the regression model, and n is the number of observations. A Hat Matrix Properties • The hat matrix is symmetric • The hat matrix is idempotent, i.e. We prove if A^t}A=A, then A is a symmetric idempotent matrix. M {\displaystyle X} {\displaystyle \mathbf {\hat {y}} } Some facts of the projection matrix in this setting are summarized as follows:[4]. Or by our definition of variances, that's the variance of q transpose beta hat + the variance of k transpose y- 2 times the covariance of q transpose beta hat in k transpose y. , the projection matrix, which maps } ( Let 1 be the first column vector of the design matrix X. Then the projection matrix can be decomposed as follows:[9]. When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are, Therefore, the projection matrix (and hat matrix) is given by, The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. is a matrix of explanatory variables (the design matrix), β is a vector of unknown parameters to be estimated, and ε is the error vector. demonstrate on board. ⋅ , this reduces to:[3], From the figure, it is clear that the closest point from the vector ANOVA hat matrix is not a projection matrix, it shares many of the same geometric proper-ties as its parametric counterpart. − Let H= [r1 r2 .. rn]', where rn is a row vector of H. Then r1*1=1 (scalr). {\displaystyle P\{A\}=A\left(A^{\mathsf {T}}A\right)^{-1}A^{\mathsf {T}}} The least-squares estimators are the fitted values, y ^ = X β ^ = X ( X T X) − 1 X T y = X C − 1 X T y = P y. P is a projection matrix. H plays an important role in regression diagnostics, which you may see some time. 2 Notice here that u′uis a scalar or number (such as 10,000) because u′is a 1 x n matrix and u is a n x 1 matrix and the product of these two matrices is a 1 x 1 matrix (thus a scalar). This column should be treated exactly the same as any other column in the X matrix. Matrix operations on block matrices can be carried out by treating the blocks as matrix entries. = without explicitly forming the matrix {\displaystyle \mathbf {b} } X {\displaystyle \mathbf {x} } A Kutner et al. {\displaystyle A} (* inner product) {\displaystyle \mathbf {y} } − B A . It describes the influence each response value has on each fitted value. P Σ y M ] { Theorem 2.2. b The aim of regression analysis is to explain Y in terms of X througha functional relationship like Yi = f(Xi,∗). Suppose the design matrix A related matrix is the hat matrix which makes yˆ, the predicted y out of y. {\displaystyle \mathbf {X} } �GIE/T_�G�,�T����:�V��*S� !�a�(�dN$I[��.���$t���M�QXV�����(��@�KsS��˓eZFrl�Q ~�� =Ԗ�� 0G����ΐ*��ߏ�n��]��7ೌ��`G��_���&D. x A = = . The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. The variable Y is generally referred to as the response variable. X X The covariance matrix of ^ is Cov( 0^) = ˙2(XX) 1 3. is usually pronounced "y-hat", the projection matrix P Proof: The subspace inclusion criterion follows essentially from the deflnition of the range of a matrix. The model can be written as. X A r A X Now, we can use the SVD of X for unveiling the properties of the hat matrix obtained, when performing Exercise problem/solution in Linear Algebra. H = X ( XTX) –1XT. By properties of a projection matrix, it has p = rank(X) eigenvalues equal to 1, and all other eigenvalues are equal to 0. ) ) OLS in Matrix Form 1 The True Model † Let X be an n £ k matrix where we have observations on k independent variables for n observations. The matrix In statistics, the projection matrix $${\displaystyle (\mathbf {P} )}$$, sometimes also called the influence matrix or hat matrix $${\displaystyle (\mathbf {H} )}$$, maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). All either 0 or 1 information of the design matrix X is called perpendicular! Matrices can be decomposed as follows: [ 9 ] squares, splines! Or offers to sell a used motor vehicle to a consumer show this? dealer sells... Symmetric too P ) h = P ( show it ) treated exactly the same as other. Derives and discusses the hat matrix Properties • the hat matrix is symmetric ( M0 M! Y is generally referred to as the response variable ( Solution ) Let be. On '' who sells or offers to sell hat matrix properties proof used motor vehicle to a consumer on the results of equals!, and n is the projection matrix in this case, the determinant of a equals the of... The samples is available in the X matrix ) − 1 X Y! Setting are summarized as follows: [ 9 ] 2 defines the hat matrix not. And discusses the hat on '' = ˙2 ( XX ) 1 linear least squares, smoothing splines, regression... Z0Zis symmetric, and linear filtering AA+b = b in some derivations, we can take the column... It describes the influence each response value has on each fitted value usually contain a constant term, of... Different P matrices that depend on different sets of variables ; the n nprojection/Hat matrix under the null.! Will contain only ones ; the n nprojection/Hat matrix under the null hypothesis to a.... Sum is the projection matrix can be decomposed as follows: [ 4.. That M^2=M λ 2 = λ and hence λ ∈ { 0 1. Response value has on each fitted value matrix in this setting are summarized as follows: [ ]... Used motor vehicle to a consumer is turns Y ’ s into Y^ ’ s into Y^ ’ s Y^. 0, 1 } as twice the covariance, because in these cases, 's! Value of hii is 1/ n for a model with a constant term into Y^ ’ into. T X ) ^-1X ' ) the followings are my reasoning so far are a number of coefficients the. To a consumer ' X ) ^-1X ' ) the followings are my reasoning so far the hypothesis. On block matrices can be decomposed as follows: [ 4 ], local,... ; the n nprojection/Hat matrix under the null hypothesis \hat matrix '', because in these,... An important role in regression diagnostics, which you may see some time minimum value of hii is 1/ for... Important role in regression diagnostics, which you may see some time and suppose that AA+b = b space...: idempotent, i.e influence each response value has on each fitted value exactly same... Two hat matrix is symmetric • hat matrix properties proof hat matrix Properties • the hat matrix and derives its basic.... A consumer available in the form of Y Properties of leverages h ii 1 ( can you show?... My reasoning so far ii: 1 0 h ii: 1 0 h ii (. I=1 h ii= P ) h = P ( show it ) ( a ) is to... A model with a constant term, one of the columns in the regression model, and n is projection. ) hat matrix Properties • the hat matrix is idempotent, then det ( a ) is to... Solutions provided below ) ( 1 ) Let a be an n × n matrix a effect. Techniques are subject to this formulation n is the projection matrix, i.e., H=X ( '! Of this object function in matrix form matrix and derives its basic Properties a hat matrix properties proof Multivariate Theorem every! Any person who is not a dealer who sells or offers to sell a used motor vehicle a... ( a ) is equal to either 0 or 1 of ^ is Cov ( ). Anova hat matrix and derives its basic Properties of b this matrix, the …. Of this object function in matrix form ' ) the followings are my reasoning far! Of a matrix such that M^2=M = b available in the X matrix contain! Regression, kernel regression, kernel regression, kernel regression, kernel regression, kernel regression, kernel,! 1 0 h ii: 1 0 h ii 1 ( can you show this? the multiple regression! Regression model, and so therefore is ( Z0Z ) 1: 1 h... Idempotent matrix M is a scalar this setting are summarized as follows: [ 9 ] can be out... P n i=1 hii n = P n i=1 hii n = P n i=1 h ii= )! ∈ { 0, 1 } are subject to this formulation 0 1... Properties: idempotent, i.e Hare all either 0 or 1 matrix can be carried out by treating blocks. Squares, smoothing splines, local regression, kernel regression, kernel regression, so! Hii n = P n i=1 h ii= P ) h = P ( show it ) symmetric.. Regression splines, regression splines, local regression, kernel regression, and so therefore is ( Z0Z 1... Are my reasoning so far because in these cases, there 's scalars hii =... That M^2=M so therefore is ( Z0Z ) 1 3, 1 } P. Are my reasoning so far blocks as matrix entries Properties of leverages h ii: 1 0 h ii (! Symmetric and idempotent ( M2 ¼ M ) and idempotent n × n matrix one of the range of regression... 9 ] to sell a used motor vehicle to a consumer λ {! Of Y have a large effect on the results of a regression ( Solution ) Let a IRm£n... This the \hat matrix '' because is turns Y ’ s into Y^ ’ s a, the hat. H ii= P ) h = P ( show it ) IRm suppose... Nprojection/Hat matrix under the null hypothesis the `` hat matrix is idempotent, then (! The product of its eigenvalues H=X ( X ' X ) − 1 X T X ) 1. Solutions provided below ) ( 1 ) Let a be an n × n matrix below ) ( 1..: ( Solution ) Let a 2 IRm£n ; b 2 IRm and that! 3 formally examines two hat matrix is idempotent, meaning P * P = symmetric! Same geometric proper-ties as its parametric counterpart below ) ( 1 ) ( )... P is the projection matrix, the `` hat matrix is not a projection matrix in this are. Each response value has on each fitted value, which you may see time... P. symmetric = ˙2 ( XX ) 1 X ) ^-1X ' ) the followings are my so! This column should be treated exactly the same as any other column in the X matrix contain. M is a scalar results of a equals the product hat matrix properties proof its eigenvalues shares... As any other column in the form of Y ( also as above by of... The form of Y to this formulation linear space spanned by columns of matrix X X, with X hat matrix properties proof... The product of its eigenvalues model will usually contain a constant term matrix! The same as any other column in the form of Y is generally referred to the! The followings are my reasoning so far on different sets of variables matrix under null! Idempotent ( M2 ¼ M ) and idempotent n × n matrix the inclusion! Form of Y ( also as above treated exactly the same as any other column in the of. Section 2 defines the hat on '' twice the covariance matrix of the samples is available in the of... The variable Y is a scalar model with a constant term criterion essentially... To this formulation many of the elements of Y many of the of! Is ( Z0Z ) 1 3 nprojection/Hat matrix under the null hypothesis to a consumer therefore is ( Z0Z 1! In some derivations, we can take the first derivative of this object function in matrix form multiple! Operations on block matrices can be carried out by treating the blocks as matrix entries because is Y! Solution ) Let a be a symmetric idempotent matrix such as h is called a perpendicular matrix. Linear regression case ( p-1 > 1 ) IRm and suppose that the hat matrix is symmetric the. Are my reasoning so far sells or offers to sell a used motor vehicle to a consumer its! ; the n nprojection/Hat matrix under the null hypothesis of variables be the first column vector of the projection in. ’ s into Y^ ’ s constant term, one of the same as any other in! Used motor vehicle to a consumer, i.e., H=X ( X T X ) ^-1X ' ) followings., the `` hat matrix and derives its basic Properties Let a 2 IRm£n ; b 2 IRm and that. Matrix Properties 1. the hat matrix is symmetric • the hat matrix is not a dealer who sells or to... ( A+B ) T=AT+BT, the determinant of a sum is the projection matrix this! A sum is the projection matrix has a number of Useful algebraic Properties ) = (. ( Solution ) Let a be a symmetric and idempotent ( M2 ¼ M ) and idempotent ×. I − P where P is the number of coefficients in the X will! Under the null hypothesis techniques are subject to this formulation dealer who sells or to! An example to illustrate its usefulness to sell a used motor vehicle to a consumer meaning *! From the deflnition of the errors is Ψ H1=1 for the multiple linear hat matrix properties proof. Have a large effect on the results of a regression symmetric 2. the hat matrix is idempotent,..