Artificial Intelligence is the Energy Logserver computing center. We have created a module that allows you to easily use mathematical algorithms on the collected data. The main task of Artificial Intelligence is to conduct calculations for future data, based on historical values. The prediction may be carried out by means of averaging mechanisms, creating linear functions or by learning and classifying data.

Prediction calculation can be used to determine the future disk space demand, find a reference level for measuring the response time of an application or forecast the trend of the number of users working in the application.

The Intelligence module was prepared based on one of the best libraries providing machine learning algorithms - Apache Spark. Some of the algorithms have been prepared as a proprietary solution.

The idea of the AI module is to provide mathematical algorithms to the user in such a way that in several fields could affect the method of counting new values and the entire course of the algorithm. In the user's window we can choose the number of samples for which the algorithm works and the scope of data that should be returned from the calculations.

Due to the characteristics of the implemented algorithms, it is important to ensure good quality data for the longest period in the production environment. The larger the data set, the greater the probability of correct forecasting.

## Available artificial intelligence algorithms and their characteristics

The AI algorithms prepared in the Intelligence module can be divided into two categories:

- Classification algorithms
- Regression algorithms

**Classification algorithms** allow you to specify a data class based on many available features. For example, based on land, soil, rainfall, etc. data, it is possible to determine for what type of forest these are the best conditions. These algorithms do not take into account the time in which the data was created, i.e. they do not analyze data history.

**Multi Layer Perceptron (MLP)**

is an artificial neural network (ANN) algorithm. The teaching process with supervision is used (i.e. the class is known which recognition algorithm is being taught). While learning, the data are divided into learning and test data. It is possible to create a network that will have up to 3 hidden layers (it is possible to extend it with subsequent layers, but it has a huge impact on the processing time). Good practices have been implemented to calculate the appropriate number of neurons in the individual hidden layers of ANN. The algorithm is suitable for detecting known event classes.

**Regression algorithms**, on the other hand, allow predicting the value of a feature based on its historical values. For example, based on historical data on the use of disk resources, we can predict their use in the future. Some of these algorithms look for trends and, based on them, prepare predictions.

**Simple Moving Average (SMA) / Geometric Moving Average (GMA).**

These algorithms calculate the average for the "feature to analyze from search" value. The detailed algorithm relies on iteratively calculating the average for subsequent time windows.

Simple Moving Average

Input data:

time frame = 1 hour (eleven hours)

max probes = 1 (hour)

feature to analyze from search = RTA (round trip time)

max predictions = 40 (hours)

The algorithm iteratively calculates the average server response time for subsequent periods by automatically increasing the time window - time frame.

The values calculated in this way may serve as a reference level when comparing new samples and their assessment whether they deviate significantly from the prediction or fall within the assumptions of the error.

**Simple Moving Average Line**

Input data:

- time frame = 1 hour (one hour)
- max probes = 40 (hour)
- feature to analyze from search = RTA (round trip time)
- max predictions = 40 (hours)

The difference between these algorithms is the fact that the Simple Moving Average Line function takes the calculated values from the previous step into calculations in its next iterations.

**Linear Regression Shift**

The next type of estimation of future values is Linear Regression Shift linear regression. It is an algorithm that tries to find the straight line best suited to the distribution of the analyzed data, in a given range. The algorithm finds the function y = ax + b. The algorithm for teaching the value of the analyzed feature attribute to analyze from search analyzes its value from the previous period indicated in the time frame parameter, i.e. the previous value is used to estimate the value for the next period.

Input data:

time frame = 1 hour (eleven hours)

max probes = 40 (hour)

feature to analyze from search = RTA (round trip time)

max predictions = 40 (hours)

A variation of the LRS algorithm is **Linear Regression Shift Trend (LRST)**

Linear regression with time shift and estimation of x value for the function y = ax + b from the whole set of results. This algorithm uses previously estimated values to estimate future ones. Linear regression looks for the function y = Ax + B using the least squares method (searches for the line on or at which the most data points are). As a result of calculations, the values of A and B are kept in the index, which allows the defendant to run an alert when the angle of the prediction line (parameter B) is large

**Random Forest Regression Shift (RFRS)**

The Random Forest Regression Shift is an example of a non-linear regression algorithm that, learning from a historical course, tries to map it in the future. The algorithm of random forest regression is an advanced mechanism of value grouping and creation based on the history of the graph, which illustrates the course in the future in the best possible way. An important feature of this algorithm is the fact that it tries to mimic the historical trend, while maintaining typical features for specific days or working hours. Thanks to this, comparing real values to the presented prediction allows us to detect anomalies in relation to the normal behavior of our infrastructure.

Input data:

- time frame = 1 hour (one hour)
- max probes = 5 (hours)
- feature to analyze from search = RTA (round trip time)
- max predictions = 40 (hours)

When we slightly change the number of samples to be learned, we will get a better or worse match to our data.

Input data:

time frame = 1 hour (eleven hours)

max probes = 20 (hours)

feature to analyze from search = RTA (round trip time)

max predictions = 40 (hours)

The algorithm assesses the value of the storage_cpu characteristic for subsequent periods by automatically increasing the time window - time frame.

For the above conditions:

- in the first step, the algorithm calculates the average for time frame 1 day. The aggregated data for the last 5 days will be taken for analysis.
- in the second step, calculates the average for the new time frame 2 day. Aggregated data for 5 two-day periods (10 days) will be taken for analysis
- in the next step, calculates the average for the new time frame 3 day. Aggregated data for 5 three-day periods (15 days) will be taken for analysis
- in subsequent steps, the value of n-day is increased, and estimates for the next period
- in the last step, the average for the last time period 180Day is calculated. The aggregate data for 5 180-day periods (900 days) will be taken for analysis

As can be seen in the above example, the previous note regarding the need for a long history of data necessary for prediction is confirmed.

**Trend**

It is a solution of our own work, that allows you to draw a trend line for the "feature to analyze from search" feature. The algorithm based on the entered data finds the coefficients a and b for the linear function. On this basis, it predicts future values.

An example of the algorithm's operation:

- time frame = Day (one day)
- max probes = 5 (days)
- feature to analyze from search = storage_usage
- max predictions = 30 (days)
- threshold = -1

The algorithm finds the values for the first record and the last record. On this basis, it calculates the directional coefficient of the function y = ax + b. Next, the coefficient b is calculated. In the subsequent steps, the number of prediction periods is automatically substituted under x, until the value "max predictions" is reached.

The image shows two windows with charts. The bottom window shows the actual historical values for the feature, and the top graph shows the trend line with the prediction of values for the next 30 days.

The Trend algorithm can also work in the mode of searching for the moment when the user-defined limit is exceeded.

An example of the algorithm's operation:

time frame = Day (one day)

max probes = 5 (days)

feature to analyze from search = storage_usage

max predictions = 30 (days)

threshold = 100