The example I often use when explaining this in basic terms is in having a programmer write you a ‘cat detection program’. That is a set of code, written by the human programmer based on their understanding or information of what constitutes a 'cat'. Given any picture this program will follow the fixed set of rules laid down by the human programmer and output 'cat' or 'not cat'. That is how it works in traditional computer programming.
With data science, or more specifically machine learning, instead of telling the computer how to do something (ie. determine if a picture is of a cat or not), we simply tell it what we want. Instead of a programmer thinking, ‘what do I know about a cat?’ and writing the code based on that, it is a case of using thousands of thousands of pictures of ‘cats’ and ‘not-cats’.
We can lead the computer to configure itself to tell the difference and it trains itself accordingly. Instead of a 'program' we get a 'model', which will output an opinion on the picture. No human has ever told the computer what a 'cat' is. The computer has simply configured itself automatically to do the desired job.
I should stress, these concepts are not new. The techniques used in machine learning are, in many cases, 20 years old. Three key recent changes have meant machine learning has become more relevant.
First, we now have huge amounts of digital data to train on. Second, due to advances in hardware, the computations used in these algorithms are much more efficient. Finally, new classes of algorithms have emerged.
Due to these factors, machines can not only teach themselves to do things, but for certain classes of problems machine learning produces results that are orders of magnitude better than a human programmer attempting to perform the same task.
These are usually problems with complex variables and many possibilities. It is very hard for a human to imbue the philosophical idea of a 'cat' and all its possible features and variations to a computer. However, if a computer sees millions of pictures of cats and configures itself based on those millions of pictures, it turns out it can do a much better job.
But enough on cats, for now
I believe there are classes of problems where this approach is vastly superior to the traditional ways of doing things, which many find quite surprising. These are often problems involving the categorisation of complex inputs or finding patterns and correlation in complex sets of data.
Based on this, some obvious practical financial services sector-specific use cases suggest themselves. These include using data science to look for patterns in asset or fund performance.