131.170.154.29 [30:00:00:05] "GET /logos/small_gopher.gif HTTP/1.0" 200 935 131.170.154.29 [30:00:00:06] "GET /logos/small_ftp.gif HTTP/1.0" 200 124 port11.annex1.naples.net [30:00:00:06] "GET /icons/ok2-0.gif HTTP/1.0" 200 231 131.170.154.29 [30:00:00:09] "GET /logos/us-flag.gif HTTP/1.0" 200 2788 131.170.154.29 [30:00:00:17] "GET /icons/ok2-0.gif HTTP/1.0" 200 231

This data is parsed using a python script to accumulate the number of bytes received in a 2 minute window and the following time series plot is obtained.There are no observable patterns evident to the naked eye, except a linear rising trend between 200 to 350 and similar linear decreasing trend after that.

There is an obvious outlier data, these outliers are important when we talking about simulating network traffic as the design under consideration should be able to handle the peak load. But as far as trending is concerned these outliers must be filtered out.

Here is a plot with the outliers filtered:

Before we try to fit the above series into a mathematical formula, we will discuss some of the basics required.

This is a measure of how much is a current value correlated or similar with lagged values in time, . In mathematical terms this autocorrelation can be expressed as (E is the expected value operator).

Note that this assumes that the series is weakly stationary:

- Mean of the series stays constant with
**t** - Variance remains constant with
**t** - And the correlation between and does not vary with
**t**

**Autoregressive Models**

This statistical model suggests that the present value of a variable is a linear function of the previous values.

An AR(1) (Autoregressive model of order 1) can be represented as:

A general auto-regressive model can be written as:

The constants are the autoregressive coefficients and $w_t &s=1$ is a random variable normally distributed with constant variance. This signifies that errors have no correlation with the value.

We will discuss some properties of the AR(1) model:

**Mean:**

The mean of the time series represented by the AR(1) model can be calculated as follows:

With the assumption that the series is stationary we have:

On solving for μ we get:

**Variance:**

Again we use the assumption that the series is stationary which gives:

On solving we get:

**Autocorrelation function (ACF):**

We assume the mean of the data to be 0. This happens when δ = 0. The value of variances, covariances and correlations are not affected by the specific value of the mean.

Let be the covariance with a lag of h. be the corresponding correlation.

*Covariance and correlations between observation 1 time period apart*

Covariance of observations *h* time periods apart:

So, , Thus the ACF function decreases exponentially when plotted versus the lag *h.*

Autocorrelation plot for an AR(1) model with . The graph tails off exponentially with the lag value but has some perturbations. These are due to sampling errors (number of samples for the current graph are 1000). The graph tends to the expected ideal when the number of samples are increased.

**Moving Average Models.**

In these models the shock/error from the previous observations is propagated as the series progresses.

*1st order MA model or MA(1)*

*General MA model*

We shall now discuss the properties of Moving average model of order 1

**Mean**

**Variance**

**Autocorrelation function(ACF):**

As previously defined, we first calculate the covariance value of observations h time period apart:

When h = 1, the above equations yields , that is because the condition of an independent random variable is:

And also, as the mean of the random variable is zero the expected value . Therefore the ACF shows peak = when h = 1 and is zero for other lags.

The ACF function for a Moving average model of order one

is shown below.

*(Do not get confused by the unity value at lag 0. An observation is obviously expected to be perfectly correlated with itself)*

Partial Autocorrelation Function (PACF)

This function is measures the conditional correlation between observations, given certain conditions and characteristics are accounted for. Think about how regression models are interpreted. Consider the two models:

In the first model represents the linear dependency between . In the second model, represents the linear dependency between y and x² with the dependency for x already accounted for. We all know that these two coefficients will not be same.

In general a PACF of order *h* can be represented as a conditional correlation between , conditional on the observations lying between *t and t – h.* This means that these observations have already been accounted for.

Consider a third order PACF:

**Statistical Implications of PACF**

For an AR model, PACF negates or shuts off after the order of the function, It means that for an AR model of order *two*, the PACF will have two spikes and turn off after that (practically have small perturbations that are insignificant). This is evident in the PACF plot for the model:

The same is not the case for an MA model, instead of shutting off the PACF tapers to zero. Consider the PACF for the model

Both ACF and PACF help us understand the nature of the series and also in choosing the correct model for the same.

**Network Traffic Model**

Now that we have understood the basics, we can leverage the same in the modeling of network traffic data that was discussed in the beginning. The first step is to plot the autocorrelation function for the data.

The dotted red lines show a significance level for the correlation values. The above plot shows that all the values are correlated significantly. This hints at a trend in the series. The overall trend masks the correlations of the actual perturbations. For us to model the data correctly we need to *de-trend *it. The first step is to remove any linear trends by first difference of the series:

This is how the series looks after the first difference:

Now we Plot the ACF for the above series and see whether we have been successful in removing the trend component of the correlation.

This shows a very large peak for unity lag and below significance values for the rest of the lags. This hints at an MA(1) model for. But we should also look at the PACF function in order to detect any auto-regressive nature in the data. Here is an output of the PACF for the first difference series.

The PACF output shows positive conditional correlations till a lag value of 9, but the first two correlations are significantly larger than the rest by a factor of about 50%. Thus we will model our first difference series with ARMA(2,1).

* blue: Actual series*

orange : Fitted data

The model can be written using the calculated coefficients as:

After the data is fitted into the model, we should also investigate into the nature of the residuals. A residual is defined as the deviation of the fitted data from the actual data. For a model to be feasible, the residuals should not have any significant correlation. Here is the ACF plot for the residuals for our model:

In the above ACF plot we see that there is no significant correlation between the residuals, which is a sign of a good fit. The histogram of residuals show that they are lognormally distributed, this statistic is important from a future prediction perspective.

**Possible improvements**

: The network traffic patterns tend to depend on various parameters like time of the day/year/month. For example a payroll website is more likely to receive data at the end of the month. These variations/characteristics can be accounted for by using seasonal models.**Accounting for seasonal variations**: We have assumed constant volatility for our model, but due to the highly fluxed and spiked nature of the network traffic data, better results can be obtained by accounting for changes in the volatility.**Variable Volatility**

The graphs and analysis has been done using R. Feel free to ask questions on how the same was implemented.

]]>**Basic Steps**

Let us consider a matrix:

We follow a similar but simpler procedure to the *Gaussian Jordan Method*.

**Step 1: Upper Triangular Matrix **

Only the elements below the pivot element are reduced to zero by subtracting the right amount of the **“pivot row”**.

After iterating over each pivot element we get an *upper triangular matrix:*

Each in the above equation is shown with a “prime” signifying that the element has changed during the transforms

**Step 2: Back-Substitution**

The name back-substitution arrives from the fact that that the last equation is a univariable equation and is trivial.

This value can be “**back-substituted**” into the previous equation to get the value of

which further gives,

The typical back-substitution can be represented with:

**Performance Considerations**

Strictly talking in terms of complexity, both Gaussian Jordan Elimination and Gaussian Elimination with back-substitution are algorithms. The latter is more optimal because of the reduction in the amount of operations in the innermost for loops. The difference can be attributed to full pivoting as all rows are reduced as opposed to only a subset of rows (resultant is a triangular matrix) in Gaussian Elimination with back-substitution. This reduces the number of multiplications () and additions () by a factor of 3. We can reduce this factor to 1.5 by avoiding the calculation of the inverse in Gaussian Jordan Elimination.

]]>

This post is going to explain one of basic building blocks for solving “Linear Algebra Equations”. Consider a set of equations:

This is a system on M unknowns and N equations. Each variable can be thought of a degree of freedom and each equation can be thought of as a constraint. Think about a three variable situation, like a position of a person in a 3-D coordinate. Without any constraints, he has three *degrees of freedom* in the x, y and z direction. If we are given three equations describing his position(each equation in x, y and z represents a plane in 3-D), we can pin point his co-ordinates in the 3-D space.

**Validation**

- If M > N, the number of unknowns is greater than the number of equations, the system is said to be undetermined and has infinitely many solutions. The solution space can be restricted by
*Compressed Sensing*. - if M < N, the number of equations are greater than the number of variables, the system is said to be
*overdetermined.*Here the general approach is to find the best fit solution (i.e R.M.S error values are a minimum for all equations) - If M = N, the system is
*consistent*if the following caveats are satisfied:- No row should be a linear combination of the other row, this leads to
*row degeneracy* - If all the equations have a certain variable in the exact same linear combination, the system is afflicted by
*column degeneracy*

- No row should be a linear combination of the other row, this leads to
- Both these equations effective result in the removal of a constraint and thus the system becomes indeterminable.

**Pivoting**

In order to obtain more accurate results and reduce round-off errors, a technique called *Pivoting* is used. Pivoting is done to convert a matrix to its *row echelon form.*

**What is row echelon form?**

A matrix is said to be in row echelon form if:

- All non zero rows are above the zero rows.
- The first non zero number in a row from the left called the
*Leading coefficient*or*Pivot*should be strictly to the right of the leading coefficient of row above it. - All entries in a column below the leading coefficient must be zero
- Here is an example of a matrix in row echelon form:

Pivoting can be done in two ways:

*Partial Pivoting*: In this the algorithm selects element the largest absolute value and shuffles the rows in such a way that it lies along the diagonal.*Complete Pivoting*: The algorithm scans the whole matrix for the largest element and shuffles both columns and rows to place the pivot along a diagonal

**The Algorithm**

We will be using an example matrix to illustrate this Algorithm (which is given in the text-book:

Numerical Recipes in C++

The equations we aim at solving is:

The algorithm takes two inputs, matrix A (*coefficient matrix*) and B (*solution vector*). The inverse of the matrix is returned in A and the variable vector is returned in B.

**Step 1: ****Finding the Pivot Element**

In the first step the algorithm iterates through the matrix and finds the largest element, in the first iteration the pivot element is the largest element of the last row. In our case it comes out to be five and is in the fist column, so there is no need for a column swap, it only needs to be swapped with t he first row. This swap is maintained in a two book-keeping arrays storing the actual position of pivot, so that the result can be restored.

The next time the algorithm searches for a Pivot element, it excludes and from the search.

**Step 2: Normalizing the row**

Before we understand the first step we need to understand why this actually works. Using our transformations we are basically converting the matrix into the identity matrix I. Therefore,

Where is the transformed solution vector

As we are using the equation to determine the inverse of the matrix we store the result back in A.

This step can be further subdivided into two sub-steps:

- The first step is that we normalize a given row by the Pivot element, So now our matrix equations looks like:

For the Inverse:

The solution vector also gets transformed as:

The next step is to reduce each element below the Pivot by subtracting the right amount of first row:

and similar transforms on the solution vector.

We, will discuss certain parts of the second iteration as they are slightly different from the first:

Now while iterating for the second column, the largest element found its at

Here there is no need for swapping as the pivot is found along the diagonal itself.

At the end we have done pivoting for all columns and have reduced our matrix, but we need to accommodate for the shuffling that we have done. Let us say that our book-keeping arrays:

Let us take the first case:

As the row and column number was not the same, there is an initial swap that needs to restored back. So, we swap . A row operation in the input appears as a column operation in its inverse a (explains the shuffling of columns instead of rows)

]]>