A Brief Introduction to Artificial Intelligence
Artificial Intelligence is a very wide term that includes all concepts and algorithms, which may simulate, partially replace or boost human thinking and behavior. Everything from smart lights, internet card payments, smart cars, chatbots, predictions and pattern detections in data can be considered part of the artificial intelligence phenomena. Most algorithms work on analyzing long series of data, including encoded speech and images. In this article, I would like to show you some nice examples.
Mythological note: Because of the syntax simplicity, many AI libraries utilize Python programming language as an interface to the programmer or the data scientist. Python was originally a serpent living under the Greek sanctuary of Delphi, who was killed by Apollo, the god of prophecy. The gases from the serpent’s decaying body gave inspiration to Pythia, the high priestess of Apollo.
Cyber Security and Time Window
In cyber security, you need to analyze a long series of data in real-time, such as logs from network components, user activity in all kinds of applications and so on. The analysis should detect defined suspicious patterns such as port scan, multiple failed user logins, brute force attacks to create alerts and thus make system administrators quickly react to discover the security incidents and prevent data loss or denial of service. And the way how the patterns are detected is usually through the use of a time window.
The time window looks at parts of the data series, such as one minute, five minutes, one hour, one day etc. and splits the data series in this way. Then it aggregates them and observes what kinds of and how many events match the defined security pattern. The time windows are created in real-time and there is a pressure to spare as much memory and performance as possible. The time window is thus an example of an artificial intelligence algorithm applied in the area of cyber security.
User Behavior Analysis in Telecommunications
In all kinds of situations you need to monitor the activity of entities such as users and devices to know whether they behave or work as expected. In telecommunications, the monitored devices are transmitters that also provide information about the users or SMS cards connected to them. Next to the time window, the so-called session windows are used to alert administrators, when one of the transmitters does not send events and thus requires to be repaired or replaced. The activity of users or SMS cards can be monitored as well, and even the geographic location of them can be approximated simply by the history of their connections to different transmitters.
In cyber security, this kind of analysis is called UEBA, that is user and entity behavior analysis. It is also based on detection of predefined patterns as the time window, but instead of time dimension it focuses on a specific entity.
Genetic Programming and Approximation of Complex Tasks
Genetic programming is based on evolutionary algorithms, which approximate a solution to a (mathematical) problem that by using deterministic algorithms would be hard or impossible to solve in an acceptable amount of time. Evolutionary algorithms are based on an initial population size, which is created using some idea on how the final solution may look. Then the algorithm creates multiple generations one by one, while applying reproduction/crossover of genes and mutation. In every iteration through every generation the fitness of each individual is evaluated using the assignment of the task/problem and only a certain number of individuals with highest fitness is selected for reproduction. It is necessary to escape only locally optimal solutions, meaning the ones that look like the final solution only from the perspective of similar solutions. That is why random mutation is applied as well as the possibility that some of the individuals with lower fitness or their offspring may also make it to the next generations. The evolutionary algorithms are thus very sensitive to initial parametrization and often require multiple runs with different settings.
The genetic programming goes even further to create programs or, more precisely, a tree of function calls. Each individual represents one possible tree and its fitness is higher, if the result of the function calls represent the expected output of the program/application. Thus with every generation the tree can grow to create the final program representation. Genetic programming can be used to find a solution to parsing long and non-standard messages from cyber security systems or to parametrize neural networks. If you need a quick approximation of a program, that you will later evaluate and rewrite to be more optimal and complex, you can definitely use genetic programming.
For Python, there is a nice comprehensive library for evolutionary algorithms and genetic programming called DEAP.
Finding the Shortest Path on the Map using Contraction Hierarchies
Contraction hierarchies is an algorithm for preprocessing shortcuts in map data using edges (roads) and nodes (junctions). When the preprocessing is done, people can query the program and find the shortest way between any two points on the map in a fraction of a second. The algorithm tries to remove each node from the graph and checks if a shortcut between neighboring nodes can be created based on the length of the edges previously leading to the removed node. If there is no other shortest shortcut, the algorithm continues to contract other nodes.
Customized contraction hierarchies add a customization phase between the preprocessing and the query, that allow to quickly include information about the current traffic and so on. It is all about the priority each shortcut has for the final query.
For C++, there is a library for route planning utilizing contraction hierarchies called RoutingKit.
Probabilistic Models and Weather Forecast
Probabilistic and statistical models are algorithms, which usually detect so-called seasonal patterns in data and try to make predictions on how the data series should probably continue. The season in this case is not only a yearly season, but it can represent a compound of hour, day, week, month or several years period. The probabilistic models are especially relevant when trying to predict future possible anomalies in the data, activities of users or seasonal cycles of weather.
Statistical models can predict trends in data and can be based on simple statistical approaches such as linear or polynomial regression. When you have a representative number of data, you can try different types or regression models to find the one that most likely matches the trends contained in your dataset.
For Python, there are many probabilistic libraries that even combine probabilistic models with machine learning. See SciKit Learn, Prophet, TensorFlow Probability, Probabilistic Torch etc.
Machine Learning, Deep Learning and Neural Networks
Machine learning is the most exposed part of artificial intelligence, which offers promising results. It is based on models periodically trained and tested to produce expected outputs. It may include simple decision trees built from the training dataset, such as if conditions on some extracted features, the color of a fruit for instance, which then produce results, like “the red fruit is an apple”. However, the core part of machine learning today is deep learning (layered models partly based on human brains’ structure) with its neural networks, which have different architectures based on their type and areas of their usage.
Neural networks are composed of layers of connected nodes, where each node consists of inputs, an activation function and outputs. The neural network model is trained to minimize or maximize a predefined loss function and is optimized using optimizers. Let me mention some of the types of neural networks. There are convolutional neural networks (CNN), that extract features from data and are today used mainly for processing of images and classification tasks.
Then there are recurrent neural networks (RNN) with feedback connections between layers. They are used mainly for regression tasks such as predicting data series such as text or speech. One of the most useful recurrent neural networks is long short-term memory network (LSTM), where the nodes include memory and are able to skip long gaps in data series, which is especially useful in cyber security.
Neural networks work with tensors, which are multidimensional representations of the data with given size and data type. I do not want to go much into details, instead I would like to provide some more examples from this area.
For Python and C, there are libraries for neural networks and machine learning such as TensorFlow with Keras and PyTorch.
Plant Disease Detection
Plant disease detection is based on datasets of pictures taken from healthy and ill plants. Thus the convolutional neural networks are used, which extract features from the pictures to detect patterns, which may signify that the plant is ill. Each picture is transformed to a tensor, that is a numeric representation with pixels as its elements, and passed to the neural network model for training and testing. However, the model can be extended by processing not only images, but also descriptions of the plants given by the researchers or farmers.
The descriptions can be processed by more complex CNN networks such as graph convolutional neural networks (GCN), if the keywords from the descriptions are organized into a graph showing relation between features such as “brown leaves” connected with “fungi”.
Malware detection is a cyber security task, where all kinds of neural networks such as RNN, LSTM and CNN can be used. Malware is a software designed to infiltrate a system or an application to spy on the user or damage the device. What I want to mention here is that malware in a computer can be compared to illness in a plant, thus we can translate the malware detection task into plant disease detection.
We can get datasets of normal and infiltrated files and transform them to “pictures”, or, more precisely, to the same tensors that would be obtained from decoding a picture. We can read the file byte by byte and transform every one of them to a number similar to a pixel in a picture. We would have the same data type (integer number) and a fixed “height” and “width” of the tensor. Then we define multiple CNN layers with the output being a numeric classification result, which is the probability that the file is infected by malware.
Transformers, Speech Recognition and Painting Pictures
Transformers are more complicated neural networks that utilize encoder-decoder architecture and the self-attention mechanism. Basically it means that the neural networks in the transformer are aware of the location of a word in a sentence or a pixel in the image. The encoder first transforms the original pictures or texts to a pool of features or tokens, while the decoder part creates an image or the text from the already prepared tokens.
In this way, you can instruct the encoder part to expect a picture as the input and the decoder part to produce a text output – or vice versa. You can even instruct the encoder to encode a text from one language into tokens and the decoder part to transform these tokens to text of another language. Using transformers, you can paint pictures from text or transform human audio speech into text. Transformers simply transform.
For inspiration, look at VitModel transforming images to text: https://huggingface.co/docs/transformers/model_doc/vit or the famous DALL-E mini transforming text to images: https://huggingface.co/spaces/dalle-mini/dalle-mini
There are many ways on how to build chatbots, which reply to messages of a user. Chatbots need to detect patterns in the user’s question, discover their intent and form a reply that would make sense. Chatbots can be based on the transformer architecture or simpler embedding or tokenization, that detects words in a sentence and creates numeric representations of them.
For more information, see the following articles:
Prediction of Chaos and Nuclear Fusion
The promising use case for artificial intelligence is the prediction of chaos. In mathematics, chaotic systems are represented by non-linear functions that are very sensitive to initial conditions. They include the spreading of forest fires, the climatic system of the Earth, trends in society or the control of nuclear fusion, which would deliver cheap energy to everyone.
The control of the plasma, where nuclear fusion is happening, is done using large magnets. The state of the plasma changes rapidly, so that is why the reinforcement learning, or the learning through feedbacks must be used. The solution to this problem can utilize neural networks, but it is not mandatory. DeepMind open sourced a part of their simulation to the nuclear fusion control problem, which is available here: https://github.com/deepmind/deepmind-research/tree/master/fusion_tcv