22

June

2017

**Numb3rs **was a **TV crime series** which had as its backdrop different themes related with numbers and mathematics. One of the objectives was to promote numeracy among the populace and it was actually assessed by a team of **real mathematicians** to guarantee its veracity and authenticity. Each episode covered specific mathematical themes. For example, in one episode “Traffic”, the question of determining whether something is random or not was considered, together with choosing road widths to optimize traffic flow.

In the following, we are going to take a ‘shallow dive’ on two types of **data generator systems**: “**chaotic**” and “**random**” (for a deep dive check the references), giving some practical examples.

**Chaotic**

**Chaotic systems **are often defined in terms of a** dynamic environment **which has a set of** initial conditions **as starting point, and which is** very sensitive to any change **in the initial conditions**. **For example,** the relative position and state of several balls on a billiard table** or the **starting position of a double pendulum**. Although chaotic systems seem **dynamically unstable**, it can be demonstrated that their event sequences are in fact generated **deterministically**.

Two examples of **chaotic systems** are present in **meteorology** and **stock markets**. Time series moving averages are typically used as short term predictors. However, some apparently small cause can have large effects and “spook” the market or change the evolution of a weather front.

**Chaos is potentially predictable** if there is a particular stable statistical spread of outcomes over a sufficiently long time-period, independent of the starting state. **Chaos becomes unpredictable** when we lack a long historical and when the initial starting state of the system is itself uncertain.

A simple way of modelling **chaotic behaviour** when the initial starting state is known is with a “**Logistic Map**” – based on a simple second degree polynomial equation. The state of a system is represented by a number *x* which evolves in discrete time steps. At each step, the state is changed according to: x_(n+1)=rx_n (1-x_n)

For some values of *r*, the behaviour of x_n is relatively simple: for large *n*, x_n will oscillate between a finite set of values. However, for most values of r beyond about *3.57*, the final behaviour of the system is highly dependent on initial conditions (that is, the initial values of *x* and *n*).

At** IRIS , **we have worked on several projects which depend on the weather. For example, one of our** H2020 **projects,** RICE GUARD , **seeks to predict the appearance of a **rice disease, **called **Rice Blast, **where causality is often dependent on meteorological readings such as moving averages of humidity, temperature and dew point metrics over time.

**Random**

**Random** (sometimes called stochastic) **processes** imply unpredictability. In contrast with **Chaos**, two successive executions of a random process will give different sequences, even if the initial state is the same.

For example, the results of **tossing a coin**, or the outcomes of a **lottery** should be randomly distributed. So, if we train a **predictive model** on a sample of ten thousand coin tosses with the toss outcome as the output, the precision should approximate to 50%. A similar outcome should be found for winning lottery number ranges. A scientific study was actually conducted to evaluate the popular legend that buttered toast tends to fall buttered side down. Another, more classical example is Brownian motion, which refers to the random motion of particles suspended in a fluid, liquid or gas, resulting from their collisions.

However, **randomness** can be highly useful for data modelling. For example, by applying **Monte Carlo methods** [2] we can find a combination of inputs which correspond to one or more target output values of a data model. In a **Monte Carlo method**, we generate random numbers with a given distribution (for example, **Gaussian** or Normal) based on a mean and a standard deviation [3]. In a symmetrical distribution, this makes it more likely for numbers to be generated in the middle quartiles, and less likely on the edges. We generate **Gaussians** for each input to the data model, and loop until the model produces an output which is close to a required target (plus or minus a given tolerance). This technique can be used, for example, to calibrate machine parameters for a complex production process.

**Random and chaotic system behaviour **are found, for example, during** image recognition – **this has to be compensated when a **learner** builds a **data model**. For example, at** IRIS **we have developed different commercial devices based on** infrared spectroscopy, **such as** VISUM Palm **which performs in-situ analysis of different raw materials, and** HYPERA **which detects foreign bodies by** Hyperspectral Imaging. **In this type of measurements, noise can occur in the product (pizza, piece of chicken, pharmaceutical tablet) we are scanning or can be generated by the instrumentation itself. Also, particle scattering can have a** chaotic behaviour. **Different techniques are used to improve the** signal-to-noise ratio **and avoid undesirable effects, such as the Savitzky-Golay filter [4] and Multiple Scattering correction (MSC), among others.

*Acknowledgements:* Thanks to Idoia Martí and Laura Rodriguez of IRIS’s Science Dept. for their help with some of the content in this article.

References:

[1] http://numb3rs.wolfram.com/303/demonstrations.html

[2] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller, Equation of State Calculations by Fast Computing Machines, The Journal of Chemical Physics, 21, Vol. 6, pp. 1087-1092.

[3] D. B. Thomas, W. Luk, P. H. W. Leong, J. D. Villasenor, Gaussian Random Number Generators, ACM Computing Surveys, 39, No. 4, Article 11, 2007.

[4] Savitzky, A.; Golay, M.J.E. (1964). “Smoothing and Differentiation of Data by Simplified Least Squares Procedures”. Analytical Chemistry. 36 (8): 1627–39.

Refs images:

- https://www.pinterest.com/pin/502995852105519658/
- http://www.cs4fn.org/geography/tornadointexas.php
- http://python3.codes/random-walk/