Data Processing

Data Processing #

Introduction #

To start of, uman.ai is aware that your data is sensitive and uman.ai handles it with the utmost care. On this page it is outlined how exactly uman.ai processes your data and how uman.ai stores it. The general idea is that uman.ai limits itself to privately store the relevant pieces of your data in order to be able to search on it in a smart and safe way.

In case you want to resign from the application, all of your data pieces are removed from our storage layers within the hour.

General #

When data arrives within the uman.ai Virtual Private Cloud (VPC), it gets processed and stored within the Virtual Private Cloud. Uman.ai does not depend on any 3rd party services for processing and storing data, except for Google Cloud and its Virtual Private Cloud services.

In order to retrieve the data securely, we retrieve the refresh and access tokens from our secure token vault and access the information via OAuth 2.0 via HTTPS, as explained in Integrations.

image

There are three storage layers in which data can be stored, all encrypted at disk via Google-managed keys using AES-256 encryption:

  1. Object storage: used for storing blobs, e.g. images of content pages
  2. Persistent disk storage: used for indexing relevant information in order to be able to search on it
  3. Relational database: used for storing typical application backend information, e.g. users, permissions

In the sections below, we go into greater detail on how particular data is processed and stored.

Customer Relationship Management (CRM) data #

In order to make the content easily searchable with the notion of your customers, uman.ai will read the list of customers and store them on uman.ai side in order to cross-match them when we synchronize your content. See Integrations for the list of supported CRM systems.

Data processing #

Upon receival of the customer data, we will first try to enrich it via the Clearbit Enrichment API, in case we receive a domain name (like example.com) for the customer. In that case, we will call the Clearbit enrichment API with the retrieved domain name (like example.com) over HTTPS using the OAuth protocol. The information that uman.ai receives from Clearbit contains details about industry, people count, etc.

Subsequently the customer data is leveraged to tag content with your relevant customers. This happens when content is effectively being synchronized with the uman.ai platform. These backend sychronization workers are dedicated to your tenant and your data only.

Data storage #

As said, the customer data will be stored in the relational database, encrypted using Google-managed keys. Next to that, customer labels can be attached to content pieces for search purposes. That data is stored, encrypted using Google-managed keys, on a disk volume attached to the Elasticsearch cluster. The data in Elasticsearch is stored in indices, and each tenant has its own index. For more information on Elasticsearch indices, see https://www.elastic.co/blog/what-is-an-elasticsearch-index.

Document Management System (DMS) data #

In order to make content searchable, a connection with your document management system is required for retrieving the content which has been selected to synchronize for uman.ai. See Integrations for the list of supported document management systems.

We also leverage the metadata of the content to enhance the search experience. An example is the folder path where the content is hosted as that can contain possible relevant information related to search. Another example is the users that are related to the content (e.g. created by, last modified by).

Data processing #

Upon receival of the content data and metadata we will perform several processing actions within the uman.ai Virtual Private Cloud:

  • information extraction (e.g. extract text)
  • information enrichment (e.g. assign tags)
  • create image per content page

For all of these processing actions we can rely on internal services, the data does not leave the Virtual Private Cloud.

Data storage #

The different pieces of content data are stored separately, each in a storage layer that is fit for purpose. Basically the extracted and enriched information, together with the metadata, is stored in the relational database. A subset of that information, that is relevant to search, is stored in Elasticsearch as well.

The images of the content pages are stored on the object storage. For the content pages that match a search query, uman.ai will temporarily grant access to these images via signed URLs so that these pages can be rendered in the frontend.

image

These signed URLs are valid for only a limited time period (30 seconds) and each of them point to an isolated content page only.