dpnp.cov

dpnp.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None, *, dtype=None)

Estimate a covariance matrix, given data and weights.

For full documentation refer to numpy.cov.

Parameters:
  • m ({dpnp.ndarray, usm_ndarray}) -- A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below.

  • y ({None, dpnp.ndarray, usm_ndarray}, optional) --

    An additional set of variables and observations. y has the same form as that of m.

    Default: None.

  • rowvar (bool, optional) --

    If rowvar is True, then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

    Default: True.

  • bias (bool, optional) --

    Default normalization is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N. These values can be overridden by using the keyword ddof.

    Default: False.

  • ddof ({None, int}, optional) --

    If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified, and ddof=0 will return the simple average. See the notes for the details.

    Default: None.

  • fweights ({None, dpnp.ndarray, usm_ndarray}, optional) --

    1-D array of integer frequency weights; the number of times each observation vector should be repeated. It is required that fweights >= 0. However, the function will not raise an error when fweights < 0 for performance reasons.

    Default: None.

  • aweights ({None, dpnp.ndarray, usm_ndarray}, optional) --

    1-D array of observation vector weights. These relative weights are typically large for observations considered "important" and smaller for observations considered less "important". If ddof=0 the array of weights can be used to assign probabilities to observation vectors. It is required that aweights >= 0. However, the function will not error when aweights < 0 for performance reasons.

    Default: None.

  • dtype ({None, str, dtype object}, optional) --

    Data-type of the result. By default, the return data-type will have at least floating point type based on the capabilities of the device on which the input arrays reside.

    Default: None.

Returns:

out -- The covariance matrix of the variables.

Return type:

dpnp.ndarray

See also

dpnp.corrcoef

Normalized covariance matrix.

Notes

Assume that the observations are in the columns of the observation array m and let f = fweights and a = aweights for brevity. The steps to compute the weighted covariance are as follows:

>>> import dpnp as np
>>> m = np.arange(10, dtype=np.float32)
>>> f = np.arange(10) * 2
>>> a = np.arange(10) ** 2.0
>>> ddof = 1
>>> w = f * a
>>> v1 = np.sum(w)
>>> v2 = np.sum(w * a)
>>> m -= np.sum(m * w, axis=None, keepdims=True) / v1
>>> cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)

Note that when a == 1, the normalization factor v1 / (v1**2 - ddof * v2) goes over to 1 / (np.sum(f) - ddof) as it should.

Examples

>>> import dpnp as np
>>> x = np.array([[0, 2], [1, 1], [2, 0]]).T

Consider two variables, \(x_0\) and \(x_1\), which correlate perfectly, but in opposite directions:

>>> x
array([[0, 1, 2],
       [2, 1, 0]])

Note how \(x_0\) increases while \(x_1\) decreases. The covariance matrix shows this clearly:

>>> np.cov(x)
array([[ 1., -1.],
       [-1.,  1.]])

Note that element \(C_{0,1}\), which shows the correlation between \(x_0\) and \(x_1\), is negative.

Further, note how x and y are combined:

>>> x = np.array([-2.1, -1,  4.3])
>>> y = np.array([3,  1.1,  0.12])
>>> X = np.stack((x, y), axis=0)
>>> np.cov(X)
array([[11.71      , -4.286     ], # may vary
       [-4.286     ,  2.14413333]])
>>> np.cov(x, y)
array([[11.71      , -4.286     ], # may vary
       [-4.286     ,  2.14413333]])
>>> np.cov(x)
array(11.71)