How noise can help us see the world more clearly
The title sounds a contradiction. However, as humans, we have an enormous amount of data flowing through our sensory system – most of which is filtered by our brain, writes Loubna Bouarfa, CEO of OKRA Technologies.
For example, as you read this article your brain will throw away 99 per cent of the data it receives through your nervous system and only keep information that supports your current thoughts: one could look at this as your pre-existing beliefs.
The filtering out of information is unconsciously performed by everyone as part of our survival conditioning. Imagine the nuggets of information that our brain filters out as they don’t fit in with its current thoughts.
There are so many new insights and discoveries we miss by staying stagnant in our existing beliefs. This tunnel vision is known as overtraining in the field of pattern recognition or overfitting in the field of machine learning, and it is normally an undesired state as it narrowly fits the model around existing data.
As such the model loses the ability to deal correctly with new events or data that are – even slightly – different. As in the real world people and situations are always slightly different: this is a real problem for the successful application of pattern recognition and machine learning.
There are many examples of overtraining in human interpretation in the real world. For example, with the increasing populism and fake news phenomena, we can convince people of many things that aren’t true by providing them with more data that support their own fears and beliefs.
Our brains are optimized to seek confirmation of existing patterns and avoid threats through ‘fear learning.’ Is this situation good or bad? Is it safe or dangerous? If a situation is thought of as bad or dangerous the threat response of our brain pumps us up physically with stress hormones to deal with the threat – and the brain stores a memory of it as a data point to protect us in the future .
We also see examples of overtraining in psychology such as different forms of addiction and anxiety disorders. For example, a person with a dog phobia is likely to believe that all dogs are dangerous.
To re-frame those patterns in human brains, there are techniques such as Cognitive Behavioural Therapy (CBT), where a patient is deliberately exposed to friendly dogs to re-frame their thinking and add the belief that “most dogs are friendly.” This process of re-framing an overtrained model/brain is what we call regularisation in the field of machine learning .
Regularisation is used to stabilise an overtrained model that fits too narrowly to the data used for training. A great blog on regularisation – ‘37 Steps’  – is published by the inspiring professor Bob Duin from Delft University of Technology.
I would argue in this blog that the concept of regularisation is not a learning problem but a recognition problem and needs to be treated as such in machine learning applications – and for the best result it needs to be performed under real world conditions.
I’ll illustrate this using the dog phobia example. Imagine a particular patient experienced a traumatic event with a dog in the past; this event “dogs are dangerous” is stored in our brain permanently as highly significant and worth remembering.
This brain function developed early in our evolutionary history because having a healthy dose of fear keeps us safe from dangerous situations that might reduce our chances of survival.
The permanent storage of fear memories explains why relapses occur frequently in phobia patients. During CBT, a new memory – say, “most dogs are friendly” – is formed. But this new memory is constrained to a specific context: a friendly dog in the therapy room.
In that context, the prefrontal cortex – the rational side of the brain – stops the brain to retrieve the old fear memory . However when a patient encounters a new context in the real world environment, such as a dog in a park, the brain retrieves the fear memory that “all dogs are dangerous.”
This illustrates that regularisation through CBT in a controlled environment “the therapy room” is not sufficient for achieving the right generalisation ability for the patient to allow his rational brain to make realistic decisions.
A great example of how regularisation in the representation stage can work in the real world environment and lead to behavioural change is beautifully illustrated in the classic 1993 movie, Groundhog Day, where repeated learning experiences results in positive outcomes.
Bill Murrray plays Phil Connors – an unpleasant, arrogant person, who also happens to be a TV weatherman. Phil is sent on assignment to a small town called Punxsutawney in Pennsylvania to cover the weather for the annual Groundhog Day festival. It’s an assignment he doesn’t want and puts little effort into, resulting in an inaccurate weather forecast.
When an unforecast blizzard hits town Phil and his crew have no way out. Phil decides not to make the most of it and take part in the Groundhog Day festival with everyone else and bad-tempered, goes to bed.
He wakes up to find that what he experienced yesterday, is exactly what he experiences today; again and again and again. He is stuck in a time loop. From being confused and puzzled, Phil becomes desperate and depressed. He tries to kill himself a few times, unsuccessfully. Phil reaches a turning point and decides to make the most of his situation.
He learns new skills — becoming a superb pianist, an ice sculptor and fluent in French. As Phil continues with life in the loop, he ultimately learns to evolve as a person and be happy. He makes friends and falls in love. He decides to marry and live the rest of his life in Punxsutawney.
When Phil wakes up the next day, having made this decision, he discovers he has escaped the loop but his learnings stay with him forever.
The learning phenomena of Groundhog Day is informative: Phil is repeatedly exposed to the same experiences and the first day illustrates the ongoing behavioural pattern that he is overdoing – based on data that was stored in his memory.
His rational interpretation was conditioned by the culture and society norms of what defines good and bad, successful vs unsuccessful and so on.
Through the different observations of Groundhog Day, Phil was exposed to many examples of the same day over and over again, and through these experiences he was able to form a better understanding of events and his response. By exposing Phil to the same context over and over again, he understands and can judge the situation better allowing him to let go of previous conditioned responses.
The repeated similar situations of Groundhog Day is similar to the concept of invariants in pattern recognition ; invariants are similar variations of the same observation that can be found in the real world conditions.
The addition of noise to the original data (first Groundhog Day) is equivalent to enlarging the dataset by additional observations for the same context.
Adding random noise, however, wouldn’t make sense. It is more effective to enlarge the dataset by variations of observations that fit the real world context, in this case different scenarios for Groundhog Day as a consequence of Phil’s new behaviour. These new observations are called invariants.
If we look to machine learning today, regularisation is mostly performed in the generalisation stage; this mostly involves the use of simpler models to avoid overfitting such as linear models, or in the case of complex models regularisation is performed by stopping the learning before completion, for instance, by limiting the number of optimisation steps.
This prevents overtraining: however it doesn’t result in better accuracy of learning. As previously explained, it makes more sense to augment the data using invariants instead of trying to change the course of interpretation, for example by adding friendly dog experiences in normal, real world situations. The addition of invariants to the original data is equivalent to enlarging the dataset by additional observations to allow better representation of the situation.
Defining invariants in pattern recognition is, however, much more straightforward than in machine learning as PR deals mostly with the representation of a single spatio-temporal signal collected in a consistent fashion such as an image, video or audio feed. However machine learning focuses less on representation and mostly on interpretation of a combination of data points and signals to solve a particular problem.
There are typically further challenges around limited features, sparse data, unstructured data, missing data, incorrect data. To be able to use more invariants in ML, we need more progressive approaches that combine the concept of invariants with the reasoning part of ML.
For instance defining what noise is tolerated to define invariant observations in PR is based on what is acceptable as distortions that are invariant to human perception.
In ML, to be able to define invariant observations for a specific domain such as a particular healthcare diagnosis, we should be able to define what is considered to be invariant to the interpretation of the data and what is considered to be different for that particular diagnostic task using a priori knowledge.
ML holds great promises to provide more accurate, fast and consistent outcomes in real world situations. However, only by giving it enough training data covering interpretation invariant situations, will the ML models be sufficiently accurate to provide the high accuracy outcomes required for life and death applications such as healthcare.
It is essential in situations where there is limited training data available to use a priori knowledge to better representation of invariants for the same observations in ML prior to learning. Who would have thought that noise could help us see the world more clearly?