- What does it claim to do?
- Substantiation of claims & potential issues
- How might the end-user assess effectiveness?
- What form does it take?
- The creators
What does it claim to do?
Squirro’s ‘GDPR Solution’ aims to facilitate companies’ responses to data subject requests through idenitification, classification, and analysis of personal data stored across the organisation.
Claimed essential features
- Automatically find GDPR requests in emails and identify the associated time limits for responses.
- Uses augmented intelligence to collect, analyse, and classify data across an organisation.
- Provides a GDPR dashboard for reviewing and managing data within an organisation.
“Inbound request triaging: The Squirro GDPR solution can connect to generic inboxes and identify GDPR requests automatically. These are then classified and transferred to a GDPR workbench dashboard that triggers a counter showing the days left to answer the request.”
“Squirro’s GDPR solution offers out-of-the-box sets of connectors to search for and collate data from across an organization, delivering it to you in an easy-to-read GDPR dashboard” (Squirro GDPR Solution; archived)
Automated cognitive search and smart classification: locates and classifies data according to user’s preference. The Squirro dashboard presents results according to such classification. (Squirro GDPR Solution; archived)
Data review process: “The entire GDPR application is powered by Augmented Intelligence, which learns as it goes, enabling it to manage an organization’s data even more effectively. This is especially valuable in the review process, which provides best-response recommendations via the GDPR dashboard. Squirro can highlight any results that should be moved to a report, as well as those that need to be reviewed confidentially, or which can’t be handed out or deleted for regulatory reasons.” (Squirro GDPR Solution; archived)
“The fully automated result list and a dashboard that may be reviewed or/and be handed out to the customer data subject (not clear if the speaker means data subject).” (quote from video: ‘GDPR Webinar Squirro AdNovum Mar122019’)
Claimed rationale and benefits
- To enable fast responses to GDPR requests.
- To avoid liability under the GDPR.
- To reduce work involved in manually assessing data processing.
Claimed design choices
- Gathers data from internal data sources such as CRM and ERP system, email, files, and archives.
- Creates a ‘fingerprint’ of data subjects to search for their personal data amongst gathered sources.
Gather via connectors available to the internal data sources most commonly used (CRM systems, ERP systems; Email; File servers: archives, Sharepoint).
Squirro claims to make unstructured data structured, therefore, computer readable. “In the case of a GDPR request, we create a fingerprint [set of attributes which describe a person] of the profile of the person that is asking for their data and we do this by enhancing the information that we automatically gathered. This fingerprint profile allows us to search automatically across all the data sources.” (quote from video: ‘GDPR Webinar Squirro AdNovum Mar122019’)
Substantiation of claims & potential issues
- The aggregation into ‘fingerprint’ profiles of data subjects’ personal data that is otherwise separate may increase compliance requirements under the GDPR.
- If too much reliance is placed on the system, any errors or incompleteness in its search for personal data could open the client to unforeseen liability.
- True compliance with the relevant legal provisions may require more than the system is capable of; there is a risk clients consider their compliance obligations to be met simply by using Squirro’s system.
There are no scientific articles describing the techniques and tools developed by Squirro. Some background detail on general Squirro stack components is given in the following whitepaper Doing the Heavy Lifting – The Squirro-Added Value for Your Data Science Team – 2019 (archived):
- Squirro aims to provide solutions for GDPR-based data subject access requests (SARs).
- “libNLP is a proprietary :natural language processing (understanding) library developed at Squirro and powered by machine learning. (…) libNLP is structured as a pipeline where a user can specify a sequence of steps to load and transform unstructured data to then be classified, clustered, etc, and then ultimately saved either to disk (CSV or JSON format) or in Squirro… The results of the libNLP pipeline can then be screened for quality using provided analyzers… The pipeline configuration is specified in JSON format.” (libNLP; archived)
- “Squirro relies on :Elasticsearch to store data efficiently.
- On top of that, many functionalities connected to :Natural Language Processing are offered. With Squirro it is easy to apply NLP data transformation and extraction such as part of speech tagging or keywords extraction.
- Another key piece of the Squirro-platform is the :Machine Learning Service. Squirro offers MLOps capabilities to train, deploy and maintain ML models very efficiently and with minimal effort.
- Squirro relies on cutting edge data encoding strategies such as Bert or Doc2Vec that transform the document in a ‘machine-readable’ format by also considering information such as meaning and semantics of the input. Efficient encoding is fundamental in to be very effective in :machine learning.
Afterward, in Squirro two :machine learning strategies can be applied:
Unsupervised learning: we rely on state-of-the-art cluster algorithms such as DBSCAN to group items together or to find outliers.
Supervised learning: a full range of supervised learning strategies are available in Squirro: they span from traditional methods (Random Forest, …) to Deep Learning (LSTM, …)
- Moreover, Squirro offers MLOps capabilities. Once a model is trained with Squirro it takes two simple clicks to deploy it within a different Squirro instance.
- Another fundamental element is model maintenance. It is very common that models degenerate over time, a process known as Conceptual Drift. To avoid that, Squirro uses a set of solutions that range from drift identification to drift prevention with constants and automatic retraining.” (Doing the Heavy Lifting – The Squirro-Added Value for Your Data Science Team – 2019; archived).
- No technical information is available on the training set or techniques used to construct the GDPR classifier for incoming requests. It is also not known how the personal information such as names, email adresses, etc are recognised in unstructured documents.
- Squirro, Doing the Heavy Lifting – The Squirro-Added Value for Your Data Science Team (2019) (archived)
How might the end-user assess effectiveness?
What form does it take?
Squirro’s GDPR application does not seem to be tailored to each user and, in that sense, it is off the shelf.Top
Legal tech company
Background of developers
EU law (EEA)
Target legal domains
Data Protection lawTop