Dougal Featherstone tells us about Frosha’s aspirations for Data Pitch and beyond.


Describe your Data Pitch challenge idea


Frosha gives customer data practical value. Companies know a lot about their customers but this data is typically held in silos. The real value of the data lies in combining these towers but this is impossible because data structures differ across systems and the data can incorporate small errors or be incomplete. This means the data’s value remains locked away. This is a Natural Language Processing (NLP) problem because the data is so ambiguous. People’s names contain geographical names, addresses contain people’s names, and company names contain variants of both.

At Frosha we are building an NLP solution that structures customer data into a single format, a process known as “entity reconciliation”. Our solution handles data from various writing systems and countries in multiple languages. Thanks to our machine learning approach, it is self-learning as well! With Frosha companies can finally access the value in their customer data.


What does the idea set out to achieve?


Frosha solves the two main problems in customer data today: ensuring it fits a standard format and then matching across silos.


What makes your idea different or unique?


Traditionally firms have solved this problem manually or using rule-based systems. Manual solutions are time consuming. It may only take 10 seconds per record but as the volume of data continues to grow, datasets with millions of records are not uncommon. This also makes manual solutions expensive. Earlier attempts at automated solutions were rule-based (e.g. always treating the string “Hugo” as a first name) but these no longer scale to modern data volumes, they require different rulesets per language and country, and require constant manual tuning as new data is encountered. Our machine learning solution completely bypasses these issues.


Where did the idea come from?


Our founding team have been active in the customer data sector for over a decade and have maintained a keen interest in AI since their university days. There have been three big changes recently: advances in AI techniques, an explosion in data volumes (90% of the world’s data was created in the last two years!) and a changing business environment (e.g. GDPR regulations introducing legal conditions to data use and storage). The conjunction of these three changes means the time is right for a deep machine learning solution to customer data problems.


What excites you about the challenge you applied for?


We love the idea of confirming our ideas at the pan-European level, especially in terms of having a single solution that works for all languages and countries, as well as for the sheer volume of records involved. We are also really excited about the opportunity to collaborate with, and learn from, some of the greatest experts in the fields of data and business! Data is a crucial part of the modern economy. We are looking forward to getting access to new datasets and, we hope to pass it forward by becoming a foundation of clean data on which others can build the next generation of innovative applications.


How did your team meet?


We met through the Rockstart Accelerator program in Amsterdam and, strangely enough, at a rowing club!


What’s the best thing about working with data?


The magic of seeing a computer model learn and improve, just by feeding it more data, until it generates startlingly accurate results.