xVector is a collaborative platform for building data applications. It is powered by MetaGraph, an intelligence engine that keeps track of all resources powering data applications. Businesses can connect, explore, experiment with algorithms, and drive outcomes rapidly. A single pane of glass enables data engineers, data scientists, business analysts, and users to extract value from data collaboratively.

Data Applications comprise all resources and related actions in creating value from data. The actions performed in a Data Application include connecting to various data sources, profiling the datasets for quality issues and anomalies, enriching data for further analysis, exploring the datasets to derive insights, mining patterns with advanced analytics and models, communicating the outputs to drive outcomes, and observing the applications for further enhancements and improvements.

Concepts

The following sections describe each of the resources that constitute a data application. Each resource performs a specific function, enabling efficient division of labor and collaboration.

Workspace - Workspace provides a convenient way to visualize and organize the interactions across resources such as data sources, datasets, reports, and models that power a DataApp. Business analysts, data scientists, and data engineers have a single pane of glass to collaborate and version their work.

Prototyping and Operationalizing Data Apps

Xvector platform allows rapid prototyping using a draft server in the design phase. Business Users and Analysts can collaborate to define the data application and hand it off to the data scientists and engineers to further refine and tune for performance during the operational phase.

Users can collaborate on each resource, such as reports, datasets, and models. Users can edit, view, or comment on a resource based on their permissions. Users just need an email to start collaboration.

User groups allow for organizing and easier sharing across users.

Users can control the visibility/scope of a given resource. Making the resources public makes them visible to all users across the cluster. Each resource has a defined URL, enabling ease of sharing.

The user will need to start the draft driver in the workspace before using it. The icon to start the draft server is on the top right of the workspace, along with other admin icons. Once the draft driver is running, users can join the session and use the draft driver. While in session, a dedicated driver to that workspace is provided, enabling users to do rapid data exploration and analysis without waiting for resource provisioning. This allows for rapid prototyping with business users. Resources created in the draft driver are marked as draft and present only in memory. Once prototyping is done, the dataset can be operationalized and materialized. Upon materialization, they persist and become available as regular resources.

Version Management

DataApps are managed with the same rigor as other software applications. Versioning ensures their stability and manageability. Once a resource such as a workspace, dataset, model, or report is assigned a version, consuming applications can be guaranteed a consistent interface.

Users can experiment endlessly in draft mode. Once they like the output, they can publish the findings/resources with a version. Any further changes result in a newer version.

Versioning allows for experimentation and stability while building data applications.

Synchronization

All the resources used to build DataApp need updates. Therefore, resources have an update_policy, and policies can be OnDemand, OnEvent(*), OnSchedule, or Rules.

These policies allow users to configure a flexible and optimal way to reflect data updates. Resources can use rules to model dependencies across different resources; for example, the user might want to update a dataset only after all the upstream datasets are updated, with each dataset potentially having a different update frequency.

The synchronization process is triggered when a data source is updated. The system notifies all the dependent resources, which take the appropriate action based on the update_policy settings.

Observability

As the complexity increases due to the scale and variety of operations, manually reviewing the application for exceptions is unwieldy and potentially error-prone. Observability makes it manageable; the system detects anomalies based on rules and machine learning. Users can define alerts based on data updates, threshold rules, or anomalies.

Users can monitor datasets, models, and reports by authoring alert rules. Alert rules are of the following types:

Threshold-based - for example, if the revenue > a value, please notify the user/user group. Update-based - if the underlying resource, such as a dataset, is updated, the user/user group subscribing to the alert rule is notified.

Anomaly-based - machine learning algorithms detect anomalies and notify the subscribers.

Governance (Discoverability, Monitoring, Security/Audit-trails)

Governance involves managing data used on the platform throughout its lifecycle, maintaining its value and integrity. It ensures the data is complete as required, secure, and compliant with the relevant regulations with an audit trail of activities. It provides accurate and timely data for informed decisions.

Platform Administration

User Management (CRUD)

Cluster Setup and Admin
Inviting users and permissibility
First steps: Log in and take a brief tour of the resources page

LoginOne can log in to the platform by clicking on https://xui.xvectorlabs.com/

Users must enter their email and password on the login page and click the Login button. After logging in, the home page displays a list of resources available by default or shared by other users. For a first-time user, this list will be empty. To start, click the Add button at the top right corner of the page to create a workspace.

It is recommended to go through the documents in the Concepts section to understand the different resources and then build an app in the created workspace.

App Store

The App Store contains publicly available workspaces. Users can use these already-created apps to accelerate their process. Users can also publish their workspaces as Apps.

All available apps can be accessed by clicking on ‘Apps’ present on the home page.

Workspace

Creating a workspace

From the homepage, click on Add and choose Workspace

Provide details

Name - Name of the workspace

Description - write a description of the workspace

Options

Once in the Workspace, xVector provides a list of options on the top right of the screen.

Each of the icons is described below:
Presence: Shows users that are currently in the workspace
Draft Server: Option to start and stop the draft server. This is used to prototype rapidly by collaborating with business users
Data Dictionary: Helps understand the metadata of the datasets
View as list/flow: Option to switch between list and flow type view of workspace resources.
Auto layout: Automatically arranges all the resources in the workspace
Grid lines: Displays vertical and horizontal lines for arranging resource cards/nodes manually
Hide comments: To hide the comments on the workspace
Add comments: Option to add new comments
Settings: Modify workspace details
- Name -> name of the workspace
- Description -> Description of the workspace
- Image -> This takes an image url which will be displayed in the workspace card
- Snapping -> for arranging the resource cards in the workspace
- Making the workspace a template -> to make the workspace a template
- Draft driver Settings -> Select the machine specifications of the draft driver you would like to use

Deleting a workspace

Go back to the home page by clicking on “xVector” on the top left of the screen.
Identify the workspace that needs to be deleted and click on the ellipses for that workspace.
Click on delete in the menu options.

Datasource

Data Source

In today's data-driven world, enterprise data is scattered across diverse landscapes - files, databases, object stores, cloud warehouses, and APIs embedded within various applications. At xVectorlabs, transforming this fragmented information into actionable insights begins with Data Sources.

The Gateway: Data Sources

Data Sources are the gateway, allowing users to connect to, import, and synchronize data from multiple origins. Whether the data resides in structured files, dynamic APIs, or sophisticated cloud storage systems, users can configure and execute a connector to bring it into xVectorlabs as a data source. A rich catalog of connectors, periodically updated by xVectorlabs, ensures compatibility with an ever-expanding array of systems. Missing a connector? Users can reach out to connectors@xvectorlabs.com, and a new one can be developed quickly to meet their needs.

Once connected, the process doesn’t stop at simply importing data. Updates from source systems are seamlessly UPSERTED, reflecting real-time changes while preserving the historical timeline of values. Bulk data import is supported with the OVERWRITE option. This meticulous synchronization ensures traceability, enabling businesses to trust the integrity and provenance of their data.

A Seamless Experience

xVectorlabs simplifies the journey from raw data to actionable insights, offering users the tools to acquire, refine, and analyze data confidently. With regular updates to its connectors catalog and robust metadata management, the platform ensures that businesses can harness the full potential of their data ecosystem - turning scattered information into cohesive narratives that drive impactful decisions.

Connectors

Available connectors based on type

Files
- CSV
- JSON
- GZIP
Databases
- MySQL
- SQL Server
- PostgreSQL
- MongoDB
Object Stores
- S3
- Minio
Cloud Data Warehouse
- Amazon Redshift
- Google Big Query
API
- Salesforce
- Mailchimp
- Zoho

Common Features

There are some settings for data sources that are very useful and necessary. Read on to find out more about them.

Metadata

xVector automatically infers metadata using sampling techniques while creating a datasource. It is recommended that the metadata be reviewed carefully and any corrections and changes made if required. Metadata setting is a crucial step.

Available settings in metadata:

COLUMN NAME

The same name that is present in the source datasource

COLUMN New NAME

The new name that is given to a column in the dataset when it is copied from the datasource

DESCRIPTION

Enter any description for a column. This description will be available to other users with access to the datasource and get copied into datasets created from this data source.

DATA TYPE

xVector automatically infers the data type of the column using a sampling technique. It is recommended that the data type be reviewed for any potential errors. xVector infers int, float, string, and date-type columns automatically.

FORMAT

The format will be applicable for data types such as datetime and currency. For datetime data, one can choose from different format options like ‘YYYY-MM-DD’, ‘DD/MM/YYYY HH:MM: SS,’ etc. The data will not be changed; this setting is only for the visualization perspective.

SEMANTIC TYPE

You can choose an appropriate semantic type for the column. For example, if the column's data type is int, then semantic types such as SSN, zip code, etc., will be available. These settings are used in visualization.

STATISTICAL TYPE

Choose from the options provided. This is again used for visualization and modeling purposes. In xVector, metadata is used extensively throughout the platform.

SKIP HISTOGRAM

It is recommended that the default setting be kept. If the SKIP HISTOGRAM is false (default), xVector generates a histogram for the column. The other profile information is generated. You can set it to True cases where the cardinality of the column is very large and may result in computation load.

NULLABLE

The default setting is NULLABLE and is set to TRUE. If the column value can not be null, change the setting to FALSE. Turning the slider off will set it to FALSE. xVector will throw a warning if a non-nullable column is found to have null values during the connection process.

DIMENSION

Set the column as either a dimension or a measure. Again, this information is used in visualization and modeling.

MEASURE

Set the column as either a dimension or a measure. Again, this information is used in visualization and modeling.

Profile

Data profiling involves examining data to assess its structure, content, and quality. This process calculates various statistical values, including minimum and maximum values, the presence of missing records, and the frequency of distinct values in categorical columns. It can also identify correlations between different attributes in the data. Data profiling can be automated by setting the profile option to true or performed manually as needed.

CSV

A CSV-type datasource is used when one needs to bring CSV data from a local system or machine. Follow the below process for creating this datasource-

From a workspace, click on Add and choose the Datasources option.

2. Choose the CSV data source type from the available list.

3. Go through the steps:

a. Configure

Choose File: Click on the Choose File button and browse your local system to select the required CSV file.
Name: Provide a meaningful name
Delimiter: Choose a delimiter according to the file
Header: Set to true if the header is present in the file, otherwise set to false.

b. Advanced

Workspace - Select a workspace for datasource
Encoding - Select encoding
Quote - Sets a single character used for escaping quoted values where the separator can be part of the value.
Escape - Sets a single character used for escaping quotes inside an already quoted value.
Comment - Sets a single character used for skipping lines beginning with this character. By default, it is disabled.
Null value - Sets the string representation of a null value.
Nan value - Sets the string representation of a non-number value.
Positive Inf - Sets the string representation of a positive infinity value.
Negative Inf - Sets the string representation of a negative infinity value.
Multiline - Parse one record, which may span multiple lines, per file.
Mode - permissive, drop malformed
Quote Escape Character - Sets a single character used for escaping the escape for the quote character.
Empty Value - Sets the string representation of an empty value
Write Option - Upsert, Bulk Insert, Insert, Delete
Run Profile - Set to true for running profile
Run Correlation - Set to true for running correlation
Machine Specifications - use default or set to false for providing machine specifications
Expression - write expressions to be executed while reading the data

c. Preview Data - display a sample of data

d. Column Metadata - view the automatically inferred metadata and modify it if needed.

Scrolling to the right gives the following:

e. Table Options -> (Record Key, Partition Key, ...)

4. Click on Save

S3

An S3 datasource is used to bring data from the AWS S3 bucket. One needs to have an AWS access key and a secret key for getting the data from the S3 bucket.

Follow the below process for creating this datasource-

From a workspace, click on Add and choose the Datasources option.

2. Choose the S3 data source type from the available list.

3. Go through the steps:

a. Configure

Datasource name - provide a meaningful name for the data

Saved accounts - can select for already created accounts
New account name - provide an account name for creating a new account
AWS access key ID - provide the AWS access key ID
AWS secret key - enter the AWS secret key
The IP address for allowed IPs - shows the allowed IPs list
File type - choose file type (CSV, JSON)
Test Connection - Click on the button to test the connection with the provided credentials
Extension - choose file extension (CSV, JSON, gzip)

b. Advanced

Workspace - choose from available options
Bucket - select the source S3 bucket
Prefix for files - provide prefix for files
Choose folder - choose files from the s3 bucket
Header - Set to true if the header is present in the file, otherwise set to false.
Delimiter - Choose a delimiter according to the file
Encoding - Select encoding
Quote - Sets a single character used for escaping quoted values where the separator can be part of the value.
Escape - Sets a single character used for escaping quotes inside an already quoted value.
Comment Sets a single character used for skipping lines beginning with this character. By default, it is disabled.
Null value - Sets the string representation of a null value.
Nan value - Sets the string representation of a non-number value.
Positive Inf - Sets the string representation of a positive infinity value.
Negative Inf - Sets the string representation of a negative infinity value.
Multiline - Parse one record, which may span multiple lines, per file.
Quote Escape Character - - Sets a single character used for escaping the escape for the quote character.
Empty Value - Sets the string representation of an empty value
Write Option - Choose from Upsert, Bulk Insert, Insert, Delete
Expression - write expressions to be executed while reading the data

c. Preview Data - display a sample of data

d. Metadata - view the automatically inferred metadata and modify it if needed.

Scrolling to the right gives the following screenshot:

e. Table Options - Record Key, Partition Key

4. Click on Save

A status screen will come up. One can track the status of the process. On completion, it will redirect to the workspace page.

Dataset

Dataset Overview

Data is the foundation of every insightful analysis, yet its raw form often lacks the structure and clarity needed for effective decision-making. Within xVectorlabs, raw data enters the system from various sources like transactional databases, APIs, logs, or third-party integrations. However, before it can be leveraged for exploratory analysis, machine learning, or operational reporting, it must be refined and structured into a more usable format.

This journey from raw data to a structured dataset involves multiple steps, ensuring accurate and insightful information. The process begins with ingestion, where data is pulled into the system (as seen in the Datasource document), followed by profiling and metadata definition, which help uncover patterns, inconsistencies, and key attributes. Following this, enrichment techniques enhance the data, making it more relevant and applicable for downstream applications.

Refining the Raw: Datasets

As data flows into the system, it becomes a Dataset - a structured, enriched version ready for analysis and modeling. This transformation begins with profiling, where metadata is extracted, and each column is classified based on its data type, statistical type (categorical or numerical), and semantic type (e.g., email, URL, or phone number). This metadata provides context, shaping downstream processes like exploratory analysis or machine learning model training.

The enrichment process brings data to life. Users can leverage an extensive toolkit to:

Profile the data and detect anomalies.
Join datasets to discover relationships.
Entity extraction/NLP and other models apply related functions.
Manually edit numerical, text, or image data for precise refinements.

This step bridges the gap between raw data and actionable insights, enabling users to prepare datasets that fuel the development of robust models and their real-world applications.

The Power of Traceability

The clear distinction between a Data Source and a Dataset isn’t just about workflow organization - it’s about traceability and trust. By maintaining the data lineage, users can always trace back to the source, ensuring transparency in processing and confidence in decision-making.

Building and Enriching Datasets

Users can create datasets from the data sources; metadata is copied along with the data. They can apply various transformations to the data, and the resulting data is persisted with an appropriate policy, such as OVERWRITE or UPSERT. Users can define the pipelines based on their requirements. If they need to access the update timeline, UPSERT would be an appropriate update policy. Furthermore, users can set up synchronization properties that suit the use case, such as on-demand, schedule, or event.

A dataset is derived from a data source and transformed to meet the needs of the DataApp. Data sources are immutable, establishing the provenance of various computations to the source systems.

Users enrich the dataset using a series(flow) of functions (actions) such as filters/aggregates and filters. They can also apply trained models to compute new columns. For example, they can use a classifier to identify customers who churn to the latest order data. Users validate and save the logic. Once saved, they can apply the changes to the original dataset and materialize the new dataset on the disk using the Materialize action. Users must be cognizant of actions that change the dataset's structure, such as joining or aggregating; if the actions change the structure, OVERWRITE is the recommended update policy.

While enrichment functions and our generative AI agent can suffice for business analysts, data engineers might prefer to transform data programmatically. Custom functions allow advanced users to author functions to alter the dataset programmatically.

Creating a Dataset

From a workspace, click on Add and choose the Datasets option

Or choose Create Dataset from the menu options of a Datasource.

2. Configure

Select datasource - select datasource from the available list
Type - choose type of the dataset

Datasets are of the following types,

Fact (default) indicates that the dataset is primarily a collection of facts - for example, orders, shipments, inventories, etc.
Entity indicates a collection of entities such as suppliers, customers, partners, distribution centers, facilities, channels, and other core entities around which the data apps create new analytical attributes, for example, a customer segment. These entity tables are used for filtering and some lightweight mastering to facilitate easy translation and transmission of data to other enterprise systems that need the information to operate on the decisions enabled by the data app.
Entity hierarchy captures hierarchical relationships among the entities, which is useful in filtering reports.

3. Advanced

Dataset name - provide a name to the dataset
Machine Specification - use default or set to false for providing machine specifications
Table Options - (Record Key, Partition Key, ...)

4. Select Workspace - select a workspace for the dataset.

Once the dataset is created, there are several options to choose from to explore and enrich the dataset. These options can be accessed from 2 different places. One from the workspace and the second from the dataset page. Below are the details.

Enrich: Workspace View

Click on the vertical ellipses or kebab menu of the dataset tile in the workspace. Following are the options:

View

To view the data. Takes you to the dataset screen.

Update Profile

To update profile. This will need to be run each time a dataset is created or updated to view the profile

View Profile

To view the profile of the data. This can be viewed only after the “update profile” is run at least once after creating or updating the dataset.

Create copy

Creates a copy of the data

Generate Report

Generates an AI powered report dashboard. This can be used as a starting point by the user and reports can then be edited.

Generate Exploratory Report

Generates an AI powered report to explore the dataset.

Generate Model

Generates an AI powered model by choosing the feature columns and automatically optimizing the training parameters based on your prompt. This can used as the starting point and the user can update parameters/metrics to generate more experiments/runs.

Activity Logs

Shows the list of activities that occurred on that dataset

Map Data

In the Workspace, click on the ellipses on the dataset tile (tile with the green icon) to see the following options:

This is to do the mappings of column metadata from source to target (used in data destination)

Sync

Synchronizes the dataset with the data source.

Update

Updates the data as per source

Materialize

To persist the dataset in the filesystem

Publish

To publish the data. It will assign a version to it and once published will have a guaranteed interface.

Export

Writes the data to a target system (points to data destination)

Settings

Opens the settings tab for the dataset

Delete

Option to delete the dataset

Enrich: Dataset View

Double click on the dataset tile in the workspace to get to the dataset page or click on the view option on the vertical ellipses of the dataset tile in the workspace. This will take you to the dataset view page.

Below is the screenshot of the dataset view page and the description of all the features that appear on the top right of the page starting from left to right.

Presence

Shows which user is in the workspace. There could be more than one user at a given time.

GenAI

The ability for users to ask data-related questions in natural language and get an automatic response. For example, the user could ask the question, “How many unique values does the age column have?”. GenAI would respond with the number of unique values for that column.

Driver (play button)

Starts or shutdown dml driver for the dataset

Edit Table

Ability to search or filter each of the columns in the dataset

Data Enrichment

Option to add xflow action or view enrichment history. A more detailed description of each of the enrichment functions is below here.

Profile and Metadata Report

Column Stats - shows histogram and statistical information about the data
Correlation - shows the correlation matrix
Column Metadata - metadata of the dataset

Data profiling involves examining data to assess its structure, content, and overall quality. This process calculates various statistical values, including minimum and maximum values, the presence of missing records, and the frequency of distinct values in categorical columns. It can also identify correlations between different attributes in the data. Data profiling can be automated by setting the profile option to true or performed manually as needed.

Write back

The user can write the data to a target system

Reviews and Version control

Ability to add reviewers and publish different versions of the resource

Action Logs

Shows the logs of action taken on that dataset

Alerts

Option to create, update, or subscribe to an alert. A more detailed description of alerts can be found here.

Comments

Users can add comments to collaborate with other users.

Settings

Basic
- Name - update name of the dataset
- Description - update the description for the data
- Workspace - select workspace from available options
- Type - type of the dataset (entity or fact)
Advanced
- Spark parameters
- Synchronization - set to true or false as required. This is used to update/sync the dataset with its source. Users can define policies and actions for synchronization-
  - Policy Type - on demand, on schedules, rule-based
  - Write mode - upsert, insert, bulk insert
  - Update Profile
  - Anomaly detection
  - Alerts
Share - to share the dataset with other users or user groups
View - to view the data

Enrichment (Σ)

The dataset is enriched using a series(flow) of functions such as aggregates and filters. Users can also apply models trained to compute new columns. Advanced users can author custom functions (*) to manipulate the data.

Getting to Enrichment Functions

Once you are in a Workspace, click on the ellipses of the dataset (tile with green icon)
Click on view (as shown below)

Once you are in a Workspace, double-click on the dataset

This will take you to the dataset view screen with icons on the top left.

For running any enrichment function, the DML driver needs to be up and running (green dot in the icons). DML driver can be started and stopped using the play-button icon on the data view page.
Once on the dataset “view” page, click on the sigma icon - Σ
Click on Add an action and choose the function from the dropdown list.

The following sections will describe each of the enrichment functions.

Aggregate

This action is used to perform the calculation on a set of values and returns a single value such as SUM, Average, etc.

Follow these steps to perform aggregate:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose aggregate
Provide details:
1. Select columns to group by - select columns that require grouping
2. Add columns and aggregation - add columns with a corresponding aggregation function
3. New column name - provide a name for aggregated columns
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example:

We will aggregate German Credit Data. We want to get the sum of credit and total duration grouped by different columns - (risk, purpose, housing, job, sex)

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose aggregate
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset.

Author Function

The “author” function can be used when writing a custom function to be applied to a dataset.

Follow these steps to perform the author function:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose the author function
Provide details:
1. Function name - provide the name of the custom function
2. Launch notebook - this will launch a notebook server. Update the custom function provided in the default template file with the required function. Click ‘save’ under the xVector menu option.
Validate - click on validate to verify the custom function
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example:

For example, we will use bank marketing campaign data. We want to categorize the individuals into three groups - student, adult, and senior depending on age. For this, we will author a custom function and run it on the dataset.

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose the author function
Provide details and click on Launch Notebook

Click save under the xVector menu option
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Custom_sql (Coming soon…)

Datatype

This action is used to change the data type of a column in the dataset.

Follow these steps to perform data type change:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose datatype
Provide details:
1. Select Column - select a column for which you want to update the data type
2. Select new type - select new type from the available list
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

For example, we will use autoinsurance_churn_with_demographic data. There is a column “Influencer” as a String and we will update this to an integer.

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose datatype
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Delete_rows

This action is used to delete rows of a dataset based on some condition.

Follow these steps to delete rows -

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose Delete rows
Provide details-
1. Expression - write the conditions for deleting rows
2. Notes - add notes to describe the action taken
3. SQL Editor - one can write SQL expressions for deletion condition
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

We have one dataset - medical_transcription_with_entities with column ‘predicted_value’. We will delete rows having an empty list ([]) in the predicted_value column.

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose Delete rows
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Drop_columns

This option is used to remove columns from the dataset.

Follow these steps to delete rows -

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose drop_columns
Provide details-
1. Select columns for deletion
2. Notes - add notes to describe the action taken
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

We have one dataset - ‘datatype update on autoinsurance_churn with demographic’. We will delete columns - ‘has_children’ and ‘length_of_residence’ from this dataset.

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose drop_columns
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Dropna

This option is used to remove missing values from the dataset.

Follow these steps to delete rows -

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose dropna
Provide details-
1. Select condition-
  1. Any: If any NA values are present, drop that row.
  2. All: If values for all columns are NA, drop that row.
2. Select columns: for dropping rows, these would be a list of columns to include.
3. Notes - add notes to describe the action taken
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

We have one dataset - ‘datatype update on autoinsurance_churn with demographic’. We will delete rows where - the ‘latitude’ and ‘longitude’ column values are not present.

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose dropna
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Explode

Explode function is useful to convert an array/list of items into rows. This function increases the number of rows in the dataset.

Follow these steps to perform explode:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose explode
Provide details:
1. Select column - select column with list objects
2. New column name - provide a name for the new column
3. Drop column - to drop the source column after the operation
4. Explode JSON - to extract data from JSON
5. Notes - add notes to describe the action taken
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

We have one dataset - medical_transcription_with_entities with column ‘medical_speciality’. We will perform an explode action on medical_speciality to extract values from the list. This will increase the number of rows.

Open the view page of the dataset and click on the sigma icon - Σ
Click on Add an action and choose explode
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Fillna

This action is used to replace null values in a dataset with a specified value.

Follow these steps to perform fillna-

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose fillna
Provide details-
1. Select Column - select column
2. Fill value - enter values to replace null
3. Fill method - choose from available options
4. Notes - enter text to describe the action taken
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

We have autoinsurance_churn_data joined with demographic data. This has a City column with null values. Let’s replace the null value with the string - ‘Not Available’.

Open the view page of the dataset and click on the sigma icon - Σ
Click on Add an action and choose Fillna
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Filter

The filter function is used to extract specific data from a dataset on a set of criteria.

Follow these steps to perform the filter:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose filter
Provide details:
1. Expression - write SQL expression for filter condition
2. Notes - keep not describing the step or action
Validate - click on validate to verify the expression
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

We will use Walmart Sales data to perform filter operation. We will filter records for Store 1.

Open the view page of the dataset and click on the sigma icon - Σ
Click on Add an action and choose filter
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Join

This action is used to join two datasets based on a particular column.

Follow these steps to add a new column:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose join
Provide details:
1. Select the right dataset - select the dataset to be joined as the right table
2. Select join type - select the type of join
3. Select columns from the table - select columns from both tables
4. Expression - write the expression for the selected join
Validate - click on validate to verify the expression
Save - to run the action on the dataset
Materialize/Materialize Overwrite - to persist modified data

Example

For example, we will be performing a join operation on the autoinsurace_churn dataset with the individuals_demographic dataset. Here, the left dataset is autoinsurance_churn, the right dataset is individuals_demographic and the column for joining is individual_id present in both datasets.

Open the view page of the dataset and click on the Sigma icon
Click on Add an action and choose join
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

JSON Normalize

This action is useful to extract data from JSON objects to structured data as columns.

Follow these steps to perform json_normalize:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose json_normalize
Provide details:
1. Select column - select column having JSON object
2. Prefix - provide a prefix for new columns
3. Keys to Project - provide keys of JSON to be extracted to a new column
4. Range of indices to extract - enter indices range for extraction. It takes all if the range is not provided.
5. Drop column - to drop the source column after the operation
6. Notes - add notes to describe the action taken
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

For example, we will be performing json_normalize on Clothing E-commerce Reviews data. This dataset contains JSON data in the predicted_value column. We will extract ‘polarity’ and ‘sentiment’ keys data from the JSON data to create new columns.

Open the view page of the dataset and click on the sigma icon - Σ
Click on Add an action and choose json_normalize
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Model Apply

This action is used to apply a trained model to the dataset. For this function, the trained model needs to be deployed and running. This produces a new column with values predicted by the model.

Follow these steps to perform the model apply:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose model apply
Provide details-
1. Select model - select a deployed model from the available list
2. Predictor Column name - provide a name for the prediction column
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

We will perform model_apply on Clothing e-commerce review data. A model for analyzing sentiments has already been deployed, and we will use this to predict the sentiments on the reviews data.

Open the view page of the dataset and click on the Sigma icon
Click on Add an action and choose model_apply
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

New_column

This action creates a new column in the dataset based on the provided expression. One can create a new column based on the values of other existing columns.

Follow these steps to add a new column:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose new column
Provide details:
1. Column name - a meaningful name for the new column
2. Expression - expression for a new column
3. Notes - text field to describe the action
Validate - click on validate to verify the expression
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

For example, we are using German_Credit_Dataset. This contains a column ‘Age’ which is numerical. We will create a new column ‘Age_Category’, where we categorize individuals as

Student (age < 18), Adult (18 < age < 60), and Senior (age > 60) using the ‘Age’ column.

Open the view page of the dataset and click on Sigma icon
Click on Add an action and choose new_column
Provide details for a new column

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Pivot

A pivot function is a data transformation tool used to reorganize data in a table from rows to columns. This function requires selecting a pivot column (categorical column), value columns (numerical column), and an aggregate function. The pivot column becomes the basis for the new rows in the pivoted table.

Follow these steps to perform the pivot-

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose pivot
Provide details-
1. Pivot Column - select a categorical column for pivot
2. Value column - select a numerical column
3. Aggregate - select the aggregate function from the available list
4. Group By - select options to group by
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

For example, we will apply pivot action on trend chart data. Here, we want to know the total volume of products in different regions grouped by company and brand.

Open the view page of the dataset and click on the sigma icon - Σ
Click on Add an action and choose pivot
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Split

The Split column function can be used to split string-type columns based on a regular expression.

Follow these steps to perform the split:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose split
Provide details:
1. Select column - select column for splitting
2. Regex - Enter the appropriate regular expression to be used for splitting in the pattern. One can enter only a delimiter for splitting string with some delimiter
3. Enter column names - Enter column names for post-split columns. For example, if the regular expression you provided results in 2 strings which have to be converted into 2 columns then provide two column names.
4. Drop source column - to drop the source column after the operation
5. Notes - add notes to describe the action taken
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

For example, we will be performing a split action on autoinsurace_churn with demographic data. It contains a column ‘home market value’ which shows a range of values as a string (2000 - 3000). We will split this on ‘-’ to get lower range and higher range values.

Open the view page of the dataset and click on the sigma icon - Σ
Click on Add an action and choose split
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Union

Union is used to add rows to the dataset

Follow these steps to perform union-

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose union
Provide details-
1. Select dataset for union - select dataset to perform union
2. Notes - add notes to describe the action taken
3. Allow disjoint append - If one wants to merge the datasets having different columns then check ‘Allow Disjoint Append’. The columns that do not match NULL values will be used appropriately while doing the Union operation
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

We are applying union on sales data for store 5 and appending records from sales data for store 1. This will result in a dataset with sales data for stores 1 and 5.

Open the view page of the dataset and click on the sigma icon - Σ
Click on Add an action and choose union
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Unpivot

Unpivoting is the process of reversing a pivot operation on data. It takes data that's been summarized into columns and spreads it back out into rows.

Follow these steps to perform unpivot:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose unpivot
Provide details:
1. Unpivot columns - select columns to unpivot
2. Name of the unpivoted column - provide the name for the unpivot column
3. Values column - provide the name for the values column
4. Drop null values - set to true if null values need to be deleted
5. Notes - enter notes for describing the action taken
Validate - click on validate to verify the inputs
Save - to run the action on the dataset
Materialize/ Materialize overwrite - to persist modified data

Example

For example, we will apply to unpivot action on pivoted trend chart data.

Open the view page of the dataset and click on the sigma icon - Σ
Click on Add an action and choose unpivot
Provide details

Upsert

Upsert is used to add rows that are not duplicates to the dataset. The difference between upsert and union is that union appends all the rows from the new dataset to the existing dataset. Upsert, on the other hand, will append only those rows that are not already there in the existing dataset.

Follow these steps to perform union-

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose union
Provide details-
1. Select dataset for upsert - select the dataset to perform an upsert
2. Notes - add notes to describe the action taken
3. Allow disjoint append - If one wants to merge the datasets having different columns then check ‘Allow Disjoint Append’. The columns that do not match NULL values will be used appropriately while doing the Upsert operation
Validate - click on validate to verify the inputs
Save - to run the action on the dataset

Example

We are applying upsert on sales data for store 5 and appending records from sales data for store 1. This will result in a dataset with sales data for stores 1 and 5.

Open the view page of the dataset and click on the sigma icon - Σ
Click on Add an action and choose upsert
Provide details

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
There is no need to materialize as Upsert will directly write to the table.

Window

This action is used to perform statistical operations such as rank, row number, etc. on a dataset and returns results for each row individually. This allows a user to perform calculations on a set of rows preceding or following the current row, within a result set. This is in contrast to regular aggregate functions, which operate on entire groups of rows. The window is defined using the over clause (in SQL), which specifies how to partition and order the rows. Partitioning divides the data into sets, while ordering defines the sequence within each partition.

Follow these steps to apply the window function:

Click on the enrichment function (represented as a sigma icon - Σ)
Click on Add an action and choose window
Provide details:
1. Select columns to partition by - select column for partitioning
2. Expression - write an expression with the required aggregate function
3. Select columns to order by - select columns for ordering
4. Define row range - define row range for the window (start, end)
5. New column name - provide column name
6. Notes - keep notes for details
7. SQL Editor - the whole window operation can be written as an SQL statement
Validate - click on validate to verify the expression
Save - to run the action on the dataset
Materialize/Materialize Overwrite - to persist modified data

Example

We will consider Walmart's Weekly Sales data to perform the window function. In this, we want to know the sum of weekly sales for different Holiday Flag, ordered by Date considering the window starting from the first row to the current row.

Open the view page of the dataset and click on the sigma icon - Σ
Click on Add an action and choose window
Provide details

SQL Editor: Use this to write your own queries.

Click on validate - this will validate the expression on the dataset
Click on save - save will run the action on the dataset
Do Materialize/ Materialize Overwrite (as needed) to persist the updated dataset

Common Options

The following are common options for different types of enrichment functions:

Validate - validates provided input expression and configuration. It is recommended to run validate first, verify the provided inputs, and update if required.
Save - runs the selected actions and saves the configuration. Saving does not result in actual data being created. It is just saving the logic of applied transformations. If you need the actual data you have to materialize the dataset.
Materialize - to persist the data. This option is available at the bottom of the xflow-action tab which can be accessed by clicking on the data enrichment icon (sigma icon - Σ ).

Alerts

Users can set up alerts based on rules for thresholds or drifts.

Threshold

Create an alert with the following steps:

Alert Name - provide a name for the alert
Alert description - provide a description of the alert
Scope - choose the alert to be either public or private
Type - choose Threshold
Add Preprocessing Step - Add a preprocessing step for the alert to be triggered. This is the function that is run on the dataset before triggering the alert. The function has all the choices like in the enrichment feature mentioned here. Choose one of them.
Expression - you can provide a custom expression for the alert to run
Validate - it’s best practice to validate your input to verify before you save.
Save - to run the action on the dataset.

Drifts

Drafts are calculated in the context of models. They are calculated when the dataset is synchronized with the Data Source. The data source should have indices for synchronization.

This is explained further in the Models section.

Models

Model Overview

Models allow users to find patterns in data and make predictions.

Regression models, a supervised learning technique, allow users to predict a value from data. For example, given inventory, advertising spending, and campaign data, a regression model can predict a lift in sales.

Classifiers can help identify different classes in a dataset, an example being the classification of customers who will churn based on order history and other digital footprints left by the customer.

Clustering models enable users to group/cluster based on different dataset attributes. Businesses use clustering models to understand customer behavior by finding different segments based on purchasing behavior.

Time series analysis allows businesses to forecast time-dependent variables such as sales, which helps manage finance and supply chain functions better.

Natural language processing (NLP) and large language models (LLM) can extract entities, identify relevant topics, or understand sentiment in textual data.

Given the sensitivity of these algorithms to the data. The distribution underlying the data can change over time, which might lead to performance deterioration of a specific algorithm; we need a mechanism that allows for selecting the best algorithm.

In the world of xVectorlabs, building models begins with creating drivers. These drivers are the foundation - powerful libraries such as Scikit-learn/xgBoost power the algorithms. Once the driver is crafted, the next step is to create a model, a framework ready to be brought to life with experimentation.

Each model becomes a hub of exploration, where experiments are authored to test various facets of the algorithm. These experiments are, in turn, populated with multiple runs, each a unique attempt to capture and compare the parameters that drive the algorithm's behavior. Imagine tweaking the settings of a regression model—adjusting its learning rate or altering its input features - and observing how these changes shape its performance.

This structured approach organizes models as a composition of drivers, experiments, and runs, creating a seamless flow for experimentation. It allows users to adapt, learn, and refine their models precisely, uncovering insights and pushing the boundaries of their algorithms' achievements.

Once the user picks a model that fits the data best, the model can be deployed to make predictions. Models in production are then continuously monitored for performance. Anomalous behaviors are quickly identified and notified for further action.

The platform provides a comprehensive set of model drivers curated based on industry best practices. Advanced users can bring their algorithms by authoring custom drivers.

Experiments

A user can create multiple experiments under a model. An experiment includes one or more runs. Under each experiment, various parameters with available drivers can be tried on different datasets.

On updating any input parameters and triggering “re-train”, a new run under that experiment gets created.

Different runs under an experiment can be compared using selected performance metrics.

Experiments Options

One can view a list of all experiments on the experiment view page that opens on viewing a model.

Comments - users can comment on experiments.
Timeline - one can view the action history for the model of the selected experiment. Click on the ( i ) icon to view the timeline.
Add a new experiment - by clicking on the ( + ) icon, one can add a new experiment to the existing mode.l

Experiment View Page

Runs

Experiments can create multiple runs with different input parameters and performance metrics as output. Based on the metric, one can be chosen for the final model.

This aims to enable a user to experiment with different model drivers, datasets, and parameters to achieve the expected performance metric for a model before deploying.

Runs Options

One can view a list of all runs on the run view page that opens on viewing an experiment.

Comments - users can comment on experiments.
Timeline - one can view the action history for the model of the selected experiment. Click on the ( i ) icon to view the timeline.
Add a new run - by clicking on the ( + ) icon, one can add a new run to the existing experiment.
View - option to view the model output report
Drifts - displays drift report of the input data
Build + Deploy - option to build and deploy the model run, making an endpoint available for prediction.
Delete - option to delete the run.
Predict - option available on the deployed model. The user can test the predicted endpoint with sample data and verify it.
Copy URL - the model predicts the URL will be copied to the clipboard
Shutdown - option to shutdown deployed model
Token - a token for authenticating requests to the model predict API.

Example

Run view page

There are options at the end of each Run which are described below (icons from left to right):

View

Drifts

Performance

Model Drivers

Runs under experiment(s) are powered by underlying libraries and algorithms defined in model drivers. For example, a statistical or machine learning library such as Sci-kit Learn for a regression model can be used. The platform provides a comprehensive set of model drivers for business analysts and advanced users to analyze. In addition, a data scientist can author custom drivers.

Authoring Drivers

Users can author custom drivers for different model types like regression, classification, clustering, time series, etc. If the requirements do not fit into any one type of model, users can choose the ‘app’ type to author their custom driver.

Creating a new driver

Follow the following steps for authoring a new driver:

From a workspace, click on Add and choose Models
Choose Author Estimator from the available list
Enter details-
1. Name - Provide a meaningful name for the driver
2. Type - choose from the available options
3. Choose a base driver - choose a base driver if the new driver needs to be authored using some existing driver.
4. Pretrained - set to true if the driver uses some pre-trained models and does not require training on a particular dataset.
5. Scope - select from available options

A task will be triggered to start a Jupyter server for authoring drivers. Once resources are allocated, click on the Launch Jupyter Server button. This will open a Jupyter notebook where users can author drivers.

In the notebook, five files that are mentioned below will be present:

train.ipynb
predict.ipynb
xvector_estimator_utils.py
config.json
requirements.txt.

Users need to write the algorithm in the train.ipynb file with the provided format. Also, modify the predict.ipynb file as required. This will be used in getting predictions. To define training or input parameters, open the config.json file in edit mode and write in the format provided.

Options on notebook-

An xVector option is present in the menu bar. Use this for different actions-

Select dataset - click on it and select the dataset for the driver. It will update the config.json file with the metadata of the selected dataset.

Register - Once the driver is authorized (train.ipynb, predict.ipynb, and config.json files have been modified correctly; this must be registered to make this driver available for use in models. Before registering, make sure all five files are present with the same name as provided.

Shutdown - this is to shut down the running Jupyter server.

Below are some models that can be created in xVector.

Regression Model

Regression is a set of techniques for uncovering relationships between features (independent variables) and a continuous outcome (dependent variable). It's a supervised learning technique, meaning the algorithm learns from labeled data where the outcome is already known. The goal is to use relationships between features to predict the outcome of new data.

Follow the below steps to create a regression model

From a workspace, click on Add and choose Models
Choose ‘Regression’ from the available options
Go through the steps-
1. Model
  1. Name - provide a name for the model
2. Configure
  1. Experiment Name - enter the name for the experiment under that model
  2. Workspace - select workspace from the available list
  3. Select dataset - select a dataset for training the model
  4. Choose a driver - select a driver from the available list
  5. Predictor Column - select the target column for prediction
3. Features
  1. Select features - select all the columns used for the training model.
4. Parameters
  1. Provide values to listed training parameters.
5. Advanced
  1. Machine type - choose machine type from the available list. Training will run on the selected machine type.
  2. Requisition type - select from available options.

This will start the model creation process (allocating resources, training the model, and saving output). This results in a model with experiment(s) and run(s) under it. Users can view model runs and performance metrics once the training is complete.

Example

We will train a model on Weekly Sales Data to understand the relationship between the weekly sales and other columns. This trained model can then be used to predict sales, given the values for input columns on which the model has been trained.

Click on Add and choose Models.

Choose Regression

Model Tab

Configure Tab

Features Tab

Parameters Tab

Advanced Tab

Classification Model

Classification categorizes data into predefined categories or classes based on features or attributes. This uses labeled data for training. Classification can be used for spam filtering, image recognition, fraud detection, etc.

Follow the below steps to create a regression model

From a workspace, click on Add and choose Models
Choose Classification from the available options
Go through the steps-
1. Model
  1. Name - provide a name for the model
2. Configure
  1. Experiment Name - enter name for the experiment under that model
  2. Workspace - select workspace from the available list
  3. Select dataset - select the dataset for training the model
  4. Choose a driver - select a driver from the available list
  5. Predictor Column - select the target column for prediction. This should be the column containing the label or class values to be predicted.
3. Features
  1. Select features - select all the columns that will be used for training the model.
4. Parameters
  1. Provide values to listed training parameters.
5. Advanced
  1. Machine type - choose machine type from the available list. Training will run on the selected machine type.
  2. Requisition type - select from available options.

This will start the model creation process (allocating resources, training the model, and saving output). This results in a model with experiment(s) and run(s) under it. One can view model runs and performance metrics once the training is complete.

Example

For example, we will train a classifier model on the auto insurance churn dataset to find whether a given individual with details will churn.

Click on Add and choose Models

Choose Classification

Model details

Configure details

Features Details

Parameters details

Advanced details

Clustering Model

Clustering is an unsupervised learning technique that uses unlabeled data. The goal of clustering is to identify groups (or clusters) within the data where the data points in each group are similar and dissimilar to data points in other groups. It can be used for customer segmentation in marketing, anomaly detection in fraud analysis, or image compression.

Follow the below steps to create a clustering model

From a workspace, click on Add and choose Models
Choose Clustering from the available options
Go through the steps-
1. Model
  1. Name - provide a name for the model
2. Configure
  1. Experiment Name - enter name for the experiment under that model
  2. Workspace - select workspace from the available list
  3. Select dataset - select the dataset for training the model
  4. Choose a driver - select a driver from the available list
  5. Predictor Column - This is optional. Select if ground truth is available.
3. Features
  1. Select features - select all the columns that will be used for training the model.
4. Parameters
  1. Provide values to listed training parameters.
5. Advanced
  1. Machine type - choose machine type from the available list. Training will run on the selected machine type.
  2. Requisition type - select from available options.

Example

For example, we will train a clustering model on the online retail store dataset to identify and understand customer segments based on purchasing behaviors to improve customer retention and maximize revenue.

Click on Add and choose Models.

Choose Clustering

Model details

Configure details

Features details

Parameters details

Advanced details

Time Series Model

Time series analysis is a technique used to analyze data points collected over time. It's specifically designed to understand how things change over time. The core objective of time series analysis is to identify patterns within the data, such as trends (upward or downward movements), seasonality (recurring fluctuations based on time of year, day, etc.), and cycles (long-term, repeating patterns). By understanding these historical patterns, time series analysis can be used to forecast future values. This is helpful in various applications like predicting future sales, stock prices, or energy consumption.

Follow the below steps to create a time series model

From a workspace, click on Add and choose Models
Choose Timeseries Analysis from the available options
Go through the steps-
1. Model
  1. Name - provide a name for the model
2. Configure
  1. Experiment Name - enter name for the experiment under that model
  2. Workspace - select workspace from the available list
  3. Select dataset - select the dataset for training the model
  4. Choose a driver - select a driver from the available list
  5. Date column - select the date column in the dataset
  6. Forecast Column - select the forecast column.
3. Parameters
  1. Provide values to listed training parameters.
4. Advanced
  1. Machine type - choose machine type from the available list. Training will run on the selected machine type.
  2. Requisition type - select from available options.

Example

For example, we will train a time series model using Weekly Sales Data for Store_1. This takes the date column as input and forecasts weekly_sales.

Click on Add and choose Models.

Choose Timeseries

Model details

Configure details

Parameters details

Advanced details

Sentiment Analysis

Sentiment Analysis is the process of computationally identifying and classifying the emotional tone of a piece of text. It's a subfield of natural language processing (NLP) used to understand the attitude, opinion, or general feeling expressed in a text.

Follow the below steps to create a sentiment analysis model

From a workspace, click on Add and choose Models
Choose Sentiment Analysis from the available options
Go through the steps-
1. Model
  1. Name - provide a name for the model
2. Configure
  1. Experiment Name - enter name for the experiment under that model
  2. Workspace - select workspace from the available list
  3. Select dataset - select the dataset for training the model
  4. Choose a driver - select a driver from the available list
  5. Text data Column - select the text data column
3. Features
  1. Select features - This step can be skipped for sentiment analysis drivers.
4. Parameters - provide values to listed training parameters.
5. Advanced
  1. Machine type - choose machine type from the available list. Training will run on the selected machine type.
  2. Requisition type - select from available options.

This will start the model creation process (allocating resources, training the model [if the driver is not pre-trained], and saving output). This results in a model with experiment(s) and run(s) under it. One can view model runs and performance metrics once the training is complete.

Example

We will create a model to find reviews' sentiment in Clothing E-Commerce Reviews data. For this, we are using a pre-trained driver.

Click on Add and choose Models

Choose Classification

Model details

Configure details

Features Details

This step is skipped for sentiment analysis

Parameters Details

Advanced details

Entity Recognition

Entity Recognition is a sub-task within Natural Language Processing (NLP) that identifies and classifies essential elements in text data. It helps identify key information pieces like names, places, organizations, etc. NER automates this process by finding these entities and assigning them predefined categories.

Follow the below steps to create an entity recognition model

From a workspace, click on Add and choose Models
Choose Entity Recognition from the available options
Go through the steps-
1. Model
  1. Name - provide a name for the model
2. Configure
  1. Experiment Name - enter name for the experiment under that model
  2. Workspace - select workspace from the available list
  3. Select dataset - select the dataset for training the model
  4. Choose a driver - select a driver from the available list
  5. Text data Column - select the text data column
3. Features
  1. Select features - This step can be skipped for entity recognition drivers.
4. Parameters - provide values to listed training parameters.
5. Advanced
  1. Machine type - choose machine type from the available list. Training will run on the selected machine type.
  2. Requisition type - select from available options.

Example

For example, we will create a model to extract named entities from medical transcription samples.

Click on Add and choose Models.

Choose Entity Recognition

Model details

Configure details

Features Details

This step is skipped for entity recognition

Parameters Details

Advanced details

Topic Modeling

Topic modeling helps analyze extensive collections of text data to discover hidden thematic patterns. The algorithm scans the text data, looking for frequently occurring words and phrases that appear together. These word clusters suggest thematic connections.

Follow the below steps to create a sentiment analysis model

From a workspace, click on Add and choose Models
Choose Topic Modeling from the available options
Go through the steps-
1. Model
  1. Name - provide a name for the model
2. Configure
  1. Experiment Name - enter name for the experiment under that model
  2. Workspace - select workspace from the available list
  3. Select dataset - select the dataset for training the model
  4. Choose a driver - select a driver from the available list
  5. Text data Column - select the text data column
3. Parameters - provide values to listed training parameters.
4. Advanced
  1. Machine type - choose machine type from the available list. Training will run on the selected machine type.
  2. Requisition type - select from available options.

Example

For example, we will use Clothing E-Commerce review data and create a model to extract topics from the reviews.

Click on Add and choose Models.

Choose Topic Modeling

Model details

Configure details

Parameter details

Advanced details

Updating a Model

To update the details of a model, one needs to open the settings pane for the model, modify the data, and click ‘Save. ’ The settings pane can be opened by selecting the ‘Settings’ option available in the menu options of model cards in the workspace.

Deleting a model

A model can be deleted by clicking on the ‘Delete’ in the menu options of model cards in the workspace.

Reports

Reports Overview

Reports transform data into stories. They offer a dynamic canvas where one can visualize and explore information, crafting interactive dashboards that bring data to life. Businesses can collaborate effectively and uncover deeper insights by delving into models, slicing and dicing data, and drilling into details. These reports serve as a bridge for data exploration and communication, helping stakeholders align their understanding of analytical facets.

Users organize their stories in sheets. Sheets comprise various visual and interactive components to enable a rich collaborative experience. In addition, collaboration can be extended within and beyond the organization by embedding the reports in multiple applications.

Users can analyze their data with various key widgets, such as:

Exploring data is dynamic, and slicing and dicing are core components. By breaking data into smaller segments, users can uncover patterns, trends, and insights that would otherwise remain hidden. Whether filtering by time, geography, products, or channels, this capability enables users to focus on what truly matters. Filters further enrich this process by providing precise control:

Entity Filters: Easily navigate through hierarchical data, such as product families or geographic regions, to find relevant insights.
Date Filters: Adjust views with flexible time ranges, from specific days to months and years.
Sticky Filters: Keep analysis consistent across multiple sheets by carrying filters forward and maintaining context throughout the exploration.
Component Filters: For advanced use cases, craft complex expressions with the expression editor to combine variables and datasets for a deeper dive into the data.

Interactivity empowers users to engage with their data in meaningful ways. Features like Column Strips, Date Aggregator Strips, and Input Values further enhance the interactive experience by introducing flexibility and customization:

Column Strips: Provides the flexibility to dynamically select columns, empowering users to tailor their visualizations to the specific metrics or dimensions they care about without reconfiguring any component.
Date Aggregator Strips: Enable users to refine their analysis with time-based aggregation options by day, week, month, or quarter, providing greater granularity to time-series visualizations.
Input Values: The ability to dynamically change single or range values gives users greater control and precision in their analysis.

Reports become more than just data displays when they provide context-rich narratives. Features like the Link Component ensure seamless navigation between components while preserving context. Drill-down charts empower users to dig deeper into aggregate data, and right-click menus streamline the exploration of related insights, offering intuitive pathways to uncover hidden stories.

Advanced Usage
- Expression: Expressions allow for complex expressions to be authored as a formula. Formulas can use the richness of various functions, enabling more dynamism and richness to the reports.
- Variable: A Variable enabling values populated from a dataset or inputs. Custom expressions can use variables in advanced expressions.
- Context Strings (sibling variables): Reports with components requiring data from other components in the report. Context Strings provide the mechanism to use values from other sibling components. Examples of data available from the scorecard (Primary/Secondary), filter selected value or input value can be single (.value) or a range (.start) and (.end)

In xVectorlabs, reports aren’t just static outputs; they are evolving stories, empowering users to craft compelling narratives, uncover hidden insights, and foster collaboration that drives impactful decisions.

Report

Reports are used for data exploration and creating visualizations.

Reports help get a better understanding of the data and build dashboards to gain insights into business goals and operations.

Sheets

Users organize their stories in sheets. Sheets comprise various visual and interactive components to enable a rich collaborative experience. Users can create multiple sheets in a report; each sheet can consist of components like line charts, pie charts, data filters, etc. Users can rename and rearrange created sheets.

Report Drivers

Drivers are needed to view and interact with live data on report components. All reports use a default driver unless configured to use a dedicated driver.

Layout

The layout allows users to choose canvas sizes from available options, or they can also create custom ones. One can set the width and height of the canvas. The layout also provides an option for enabling snapping that helps arrange the components in a sheet.

Theme

Themes make it easy to set several settings for the report. One can choose from a list of themes or edit them according to their requirements.

Creating a report

From a workspace, click on Add and choose Reports

Click on the +Component button and choose the component type from the available options.

Available components:

Each component below will use terms like measures and dimensions, which are explained in the Datasource document.

Scorecard

A scorecard is a visual summary of key performance indicators (KPIs) that helps stakeholders quickly assess performance against goals. It provides a way to monitor progress over time, identify trends, and identify areas where adjustments may be needed.

Typically, scorecards have primary and secondary metrics that are tracked, which are described below:

Primary Metrics: The most important metric(s) that directly measure success or failure for a given objective.

Example Use Cases:
- In Sales Reports → Total Revenue, Conversion Rate
- In Marketing Reports → Customer Acquisition Cost (CAC), Return on Ad Spend (ROAS)
- In Operations Reports → Production Efficiency, Uptime Percentage

Secondary Metrics (Supporting Indicators): Additional metrics that provide context or explain trends in the primary metric. They help understand why performance is changing.

Example Use Cases:
- In Sales Reports → Average Deal Size, Sales Cycle Length (supports Total Revenue)
- In Marketing Reports → Click-Through Rate (CTR), Impressions (supports Conversion Rate)
- In Operations Reports → Employee Productivity, Defect Rate (supports Production Efficiency)

Follow these steps to create a scorecard -

Click on Add Component (➕) and choose Scorecard
Use the Data tab to provide configuration for the scorecard
1. Select a dataset - select a dataset for the scorecard
2. Select measure - provide a numerical column with an aggregate function
3. Default date range - choose from auto or custom. The auto will take from applied date filters, and for custom, the user needs to define the date range
4. Comparison date range - select from available options for comparison metrics
5. Scorecard name - provide a name to the scorecard component
Use the Format tab to provide formatting options for scorecard
1. For conditional formatting, click on Add to define rules for conditional formatting.
2. Layout - choose from vertical, horizontal, and tabular options
3. Primary Metric - formatting and styling option for the primary metric
4. Secondary Metric - formatting and styling option for the secondary metric
5. Difference metric - formatting and styling option for the difference metric
6. Comparison metric - formatting and styling option for the comparison metric
7. Label - provide the label to the scorecard
8. Background and Border - formatting and styling option for the background and border

Example

We will create a scorecard on Weekly Sales Data. We want to check the total sales for a given period and compare them against the previous period.

Created Scorecard

Steps to create a scorecard:

Click on Add Component (➕)

Choose Scorecard

Enter Details in the data and format tab

Data tab

Format tab

Format tab configurations

Charts: Line Chart

Charts help in visually representing the data which helps in understanding trends, patterns, and relationships in the data. They are like mini-reports that use bars, lines, and pies, to convey information visually.

Types of Charts

Line Chart

A line chart is used to display trends between continuous numeric values.

Follow these steps to create a line chart -

Click on Add Component (➕) and choose Chart
Use the Data tab to provide configuration for the chart
1. Chart Type - Select line chart
2. Dataset - select a dataset for creating a line chart
3. X-axis - select a column for the x-axis
4. Y-axis - add measures with aggregate function for the y-axis
5. Dimension (optional) - Choose a dimension column to create different lines for each category present in the selected column.
6. Component name - provide a name for the line chart component
7. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define the date range
8. Title - Provide a title for the line chart
9. Sort and limit - select a column for sorting, provide sorting order and number of records
Use the Format tab to provide formatting options for line chart
1. Secondary Y-axis - select if multiple measures in the Y-axis have been selected and you want to display two y-axes.
2. Configure formatting and styling for each selected measure
3. Provide formatting and styling details for axes, grid, and font
4. Background and Border - formatting and styling option for the background and border
5. Legend - formatting option for legend
6. Hovertemplate - Template string used for rendering the information that appears on the hover box.

Example

We are using weekly sales data to create a line chart. This is to analyze changes in weekly sales with changes in fuel prices.

Created Line chart

Click on Add Component (➕) and choose chart.

Data Tab

Format Tab

Bar Chart

This is used to represent data by rectangular bars. The length or height of each bar is proportional to a value associated with the category it represents.

Follow these steps to create a bar chart -

Click on Add Component (➕) and choose Chart
Use the Data tab to provide configuration for the chart
1. Chart Type - Select bar chart
2. Dataset - select a dataset for creating a line chart
3. X-axis - select a dimension (categorical column) for the x-axis
4. Y-axis - add measures with aggregate function for the y-axis
5. Dimension (optional) - Choose a dimension column to create bars for each category in the selected column.
6. Component name - provide a name for the bar chart component
7. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define a date range
8. Title - Provide a title for the bar chart
9. Sort and limit - select a column for sorting, and provide sorting order and number of records.
Use the Format tab to provide formatting options for the bar chart
1. Secondary Y-axis - select if multiple measures on the Y-axis have been selected and you want to display two y-axis.
2. Bar mode - select from grouped or stacked
3. Configure formatting and styling for each selected measure
4. Provide formatting and styling details for axes, grid, and font
5. Background and Border - formatting and styling option for the background and border
6. Legend - formatting option for legend
7. Hovertemplate - Template string used to render the information on the hover box.

Example

We are using weekly sales data to create a bar chart. This is to visualize total sales by different stores.

Created bar chart

Data Tab

Format Tab

Combo Chart

Combo charts combine line and bar chart types into a single view. It allows you to display different aspects of your data simultaneously, helping you identify trends, relationships, and insights that might be missed in separate charts.

Follow these steps to create a combo chart -

Click on Add Component (➕) and choose Chart
Use the Data tab to provide configuration for the chart
1. Chart Type - Select combo chart
2. Dataset - select a dataset for creating a combo chart
3. X-axis - select a dimension (categorical column) for the x-axis
4. Y-axis - add measures with aggregate function for the y-axis
5. Dimension (optional) - Choose a dimension column to create lines/bars for each category in the selected column.
6. Component name - provide a name for the combo chart component
7. Default date range - choose from auto or custom. The auto will take from applied date filters, and for custom, the user needs to define a date range
8. Title - Provide a title for the combo chart
9. Sort and limit - select a column for sorting, provide sorting order and number of records
Use the Format tab to provide formatting options for combo chart
1. Configure formatting and styling for each selected measure. You can choose type as line or bars for each measure
2. Provide formatting and styling details for axes, grid, and font
3. Background and Border - formatting and styling option for the background and border
4. Legend - formatting option for legend
5. Hovertemplate - Template string used for rendering the information that appears on the hover box.

Example

For example, we are using bank_marketing_campaign_data. We want to visualize how a call's average balance and total duration vary with age.

Created combo chart

Data Tab

Format Tab

Time Series

Time series charts are a type of line chart designed to visualize trends and patterns over time.

Follow these steps to create a time series chart -

Click on Add Component (➕) and choose Chart
Use the Data tab to provide configuration for the chart
1. Chart Type - Select time series chart
2. Dataset - Select a dataset to create a time series chart. The dataset chosen must have a datetime column.
3. X-axis - select a column for the x-axis
4. Y-axis - add measures with aggregate function for the y-axis
5. Dimension (optional) - Choose a dimension column to create different lines for each category present in the selected column.
6. Component name - provide a name for the time series chart component
7. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define a date range
8. Date dimension - select the date field from the dataset
9. Comparison date range - choose from available options to compare data
10. Title - Provide a title for the line chart
11. Sort and limit - select a column for sorting, provide sorting order and number of records
Use the Format tab to provide formatting options for the time series chart
1. Configure formatting and styling for each selected measure
2. Provide formatting and styling details for axes, grid, and font
3. Background and Border - formatting and styling option for the background and border
4. Legend - formatting option for legend

Example

We select Weekly Sales Data to create a time series chart. Here, we want to visualize change in average weekly sales over a given period and compare this with the previous period's sales

Created time series chart

Data Tab

Format Tab

Area Chart

An area chart uses both lines and shaded areas to depict how data points change over time or another numeric variable. Area charts are well-suited for showcasing trends and visualizing the accumulation of values over time.

Follow these steps to create an area chart -

Click on Add Component (➕) and choose Chart
Use the Data tab to provide configuration for the chart
1. Chart Type - Select the area chart
2. Dataset - select a dataset for creating an area chart
3. X-axis - choose a column for the x-axis
4. Y-axis - add measures with aggregate function for the y-axis
5. Dimension (optional) - Choose a dimension column to create different lines for each category present in the selected column.
6. Component name - provide a name for the area chart component
7. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define a date range
8. Title - provide a title for the line chart
9. Sort and limit - select a column for sorting, provide sorting order and number of records
Use the Format tab to provide formatting options for the line chart
1. Configure formatting and styling for each selected measure
2. Provide formatting and styling details for axes, grid, and font
3. Background and Border - formatting and styling option for the background and border
4. Legend - formatting option for legend

Example

We are using bank marketing campaign data to create an area chart. This is to analyze the total yearly balance distribution by age.

Created an area chart

Data Tab

Format Tab

Bubble Chart

A bubble chart is used to represent three dimensions of data using circles (bubbles). Two variables determine the bubble's position on the x and y axes, and the third variable specifies the size of the bubble.

Follow these steps to create a scatter plot -

Click on Add Component (➕) and choose Chart
Use the Data tab to provide configuration for the chart
1. Chart Type - Select bubble chart
2. Dataset - select a dataset for creating a bubble chart
3. X-axis - choose a column for the x-axis
4. Y-axis - add measures with aggregate function for the y-axis
5. Size - Select a column to represent the size of the bubble
6. Dimension (optional) - Choose a dimension column to create different lines for each category present in the selected column.
7. Component name - provide a name for the line chart component
8. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define a date range
9. Title - Provide a title for the bubble chart
10. Sort and limit - select a column for sorting, provide sorting order and number of records
Use the Format tab to provide formatting options for bubble chart
1. Choose size mode - diameter or area. Provide size factor for bubble
2. Configure formatting and styling for each selected measure
3. Provide formatting and styling details for axes, grid, and font
4. Background and Border - formatting and styling option for the background and border
5. Legend - formatting option for legend

Example

For example, we use the ‘German Credit data categorized by age’ dataset. Here, we will see the duration of the call, which will depend on the purpose and credit amount.

Created Bubble Chart

Data Tab

Format Tab

Pie Chart

A pie chart is a circular chart representing portions of a whole. Pie charts work best for categorical data, where the data points fall into distinct groups or slices.

Follow these steps to create a scatter plot -

Click on Add Component (➕) and choose Chart
Use the Data tab to provide configuration for the chart
1. Chart Type - Select the pie chart
2. Dataset - select a dataset for creating a pie chart
3. Dimension - select a categorical column to develop slices for different categories
4. Measure - Select a measure with aggregate to calculate the size of each pie in the chart
5. Component name - provide a name for the line chart component
6. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define a date range
7. Title - Provide a title for the pie chart
8. Sort and limit - select a column for sorting, provide sorting order and number of records
Use the Format tab to provide formatting options for the pie chart
1. Choose color-scale and value to display for each slice from the available options.
2. Provide formatting and styling details for axes, grid, and font
3. Background and Border - formatting and styling option for the background and border
4. Legend - formatting option for legend
5. Hovertemplate - Template string used for rendering the information that appears on the hover box.

Example

We will create a pie chart using bank marketing campaign data. This is to visualize the number of term subscriptions by job types.

Created pie chart

Data Tab

Format Tab

Table

This presents information in a grid format with rows and columns, making it easy to analyze multiple data points for various categories.

Follow these steps to create a table -

Click on Add Component (➕) and choose Table
Use the Data tab to provide configuration for Table
1. Dataset - select the dataset for the table
2. Columns - add columns for the table.
3. Measures - add measures (numerical columns) with aggregate function
4. Component name - provide a name to the table component
5. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define a date range
6. Comparison date range - select from available options for comparison
7. Column Sort and limit - enter details if sorting in the column is required. Need to provide column name, sorting order, and number of records
8. Measure Sort and Limit - enter details if sorting on the selected measure is required. Need to provide measure name, sorting order, and number of records
Use the Format tab to provide formatting options for the table
1. Provide formatting and styling details for each selected measure
2. For conditional formatting, click on Add to define rules for conditional formatting.
3. Provide formatting and styling details for each selected column
4. Formatting options for the table (colors, alignment, header and footer styling)
5. Title - label for the aggregated output
6. Background and Border - formatting and styling option for the background and border

Example

We will create a table from Online Retail Data. This is developed to analyze the average quantity of products and total customers present in different countries.

Created Table

Click on Add Component (➕) and

Choose Table

Data Tab

Format Tab

Pivot Table

It is used to summarize and organize the data in a way that is easier to understand. One can define which data goes into rows and which goes into columns in the pivot table. For example, one could see total sales by product (products in rows) or by region (regions in rows).

Follow these steps to create a pivot table -

Click on Add Component (➕) and choose Pivot Table
Use the Data tab to provide configuration for the Pivot Table
1. Dataset - select the dataset for the table
2. Columns - add columns for the table.
3. Rows - select columns for rows
4. Measures - add measures (numerical columns) with aggregate function
5. Component name - provide a name for the pivot table component
6. Column Sort and Limit - enter details if sorting on the column is required. Need to provide column name, sorting order, and number of records
7. Row Sort and limit - enter details if sorting on the selected row is required. Need to provide column name, sorting order, and number of records
Use the Format tab to provide formatting options for the pivot table
1. Provide formatting and styling details for each selected measure
2. For conditional formatting, click on Add to define rules for conditional formatting.
3. Provide formatting and styling details for each selected column and row
4. Formatting options for the table (colors, alignment, header and footer styling)
5. Subtotal Style Title - label for the aggregated output
6. Background and Border - formatting and styling option for the background and border

Example

We will create a table from Online Retail Data. This is designed to analyze how the total price of products has changed over time (InvoiceDate) for different countries.

Created Pivot Table

Click on Add Component (➕) and

Choose Pivot Table

Data Tab

Format Tab

Funnel

A funnel chart is divided into horizontal sections, with the widest section at the top and the narrowest at the bottom. Each section represents a stage in a process, like steps in a sales funnel or even the application process for a job.

Follow these steps to create a funnel chart-

Click on Add Component (➕) and choose Funnel
Use the Data tab to provide configuration for the funnel
1. Dataset - select dataset for creating a funnel
2. Stage - Select the column with categorical values to represent a stage in the funnel.
3. Dimension (optional) - Add a dimension to create a separate funnel for each value in the selected dimension.
4. Measure - add measure (numerical columns) with aggregate function
5. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define a date range
6. Component name - provide a name for the funnel component
Use the Format tab to provide formatting options for the table
1. Funnel type - choose from the available options (area, funnel)
2. Configure different formatting options for the funnel (font, colors, etc)
3. Background and Border - formatting and styling option for the background and border

Example

Using “Online Retail Data”, we want to see how the total quantity of different products has been distributed in various counties.

Created funnel

Click on Add Component (➕)

Choose Funnel

Data Tab

Format Tab

Treemap

A treemap displays hierarchical data using nested rectangles. It is handy for showing part-to-whole relationships and identifying how different categories contribute to a larger whole.

Follow these steps to create a tree map:

Click on Add Component (➕) and choose Treemap
Use the Data tab to provide configuration for the treemap
1. Dataset - select the data set for creating a treemap
2. Dimension - add dimensions that represent the hierarchy
3. Measure - select measure (numerical columns) with aggregate function
4. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define a date range
5. Component name - provide a name for the treemap component
6. Title - enter the title for the treemap report
Use the Format tab to provide formatting options for the table
1. Treemap colors - choose from available options.
2. Configure different formatting options for the Treemap (font, colors, etc)
3. Background and Border - formatting and styling option for the background and border

Example

For example, we will use the ‘auto insurance churn’ dataset to create a treemap to understand the total income in a hierarchical order of state, county, and city.

Created treemap

Click on Add Component (➕)

Choose Treemap

Data Tab

Format Tab

Sankey

Sankey depicts flows between different stages or categories. It uses arrows to represent these flows, with the arrow's width corresponding to the flow's magnitude.

Follow these steps to create a treemap-

Click on Add Component (➕) and choose Sankey
Use the Data tab to provide configuration for Sankey
1. Dataset - select dataset for creating a Sankey report
2. Dimension - add dimensions that represent stages
3. Measure - select measure (numerical columns) with aggregate function to show the magnitude of the flow
4. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define a date range.
5. Component name - provide a name for the Sankey component
6. Title - enter the title for the Sankey report
Use the Format tab to provide formatting options for the table
1. Sankey chart colors - choose from available options.
2. Configure different formatting options for the funnel (font, colors, etc)
3. Background and Border - formatting and styling option for the background and border

Example

Using auto insurance churn data, I created a Sankey diagram to visualize the income flow to different counties based on marital status.

Created Sankey

Click on Add Component (➕)

Choose Sankey

Data Tab

Format Tab

Sunburst

It is used to represent hierarchical data in a circular structure.

Follow these steps to create a sunburst-

Click on Add Component (➕) and choose sunburst
Use the Data tab to provide configuration for Sunburst
1. Dataset - select dataset for creating a sunburst report
2. Dimension - add dimensions to represent the hierarchy
3. Measure - select measure (numerical columns) with aggregate function to show magnitude at each hierarchical level.
4. Aggregator - choose from a number or percentage
5. Default date range - choose from auto or custom. Auto will take from applied date filters, and for custom, the user needs to define a date range.
6. Component name - provide a name for the sunburst component
7. Title - enter the title for the sunburst report
Use the Format tab to provide formatting options for Sunburst
1. Sunburst chart colors - choose from available options.
2. Configure different formatting options for the funnel (font, colors, etc)
3. Hovertemplate - Template string used for rendering the information that appears on the hover box.
4. Background and Border - formatting and styling option for the background and border

Example

We will use the ‘auto insurance churn dataset. ’ We want to view how curr_ann_amt is distributed among different counties with churn and marital statu.s

Created Sunburst

Click on Add Component (➕)

Choose Sunburst

Data Tab

Format Tab

Image

This helps in adding an image to the report sheets.

Follow these steps to create an image-

Click on Add Component (➕) and choose image
Use the Data tab to provide configuration for the image
1. Image URL - Provide the URL of the image to be used or upload it from the local system by clicking on the Upload button
2. Preserve aspect ratio - check to preserve the aspect ratio of the image
Use the Format tab to provide formatting options for the table
1. Background and Border - formatting and styling option for the background and border

Example

Created image

Text

It is used to add some text fields in the report sheets.

Follow these steps to add a text component-

Click on Add Component (➕) and choose text
Enter the required text in the input text field and style it as needed. Click on save
Background and Border - formatting and styling option for the background and border

Example

Created text

Configuration Tab

Link

Link provides an option to open another sheet or report.

Follow these steps to add a link component-

Click on Add Component (➕) and choose link
Use the Data tab to provide configuration for the image
1. Link url - provide the url to a different report or sheet
2. Link Text - add text for the link
Use the Format tab to provide formatting options for the table
1. Background and Border - formatting and styling option for the background and border
2. Configure different formatting options for the link (font, colors, etc)

Example

For example, creating a link component in the report for Clothing E-commerce reviews with sentiment data. Added a link component to show the model used for extracting topics for the reviews data.

Created Link

Click on Add Component (➕)

Choose Link

Data Tab

Format Tab

Tabs

This is used to organize different report components into tabs.

Follow these steps to create a treemap-

Click on Add Component (➕) and choose Tabs
Click on the (+) icon on the component to add more tabs. By default, there is only one tab
To add a component under a tab, create the component separately. Resize the created component per the tab component size and drag the component to the tab. It will get assigned under that tab. One can pull the report component out of the tab, too.
Use the Data tab to provide configuration for Sankey
1. Title - Provide the title to the tab
2. Position - add a position for the tab
3. Delete - to delete the tab
Use the Format tab to provide formatting options for the table
1. Opacity - Provide background opacity for the tab
2. Configure different styling options for border

Example

We will use Weekly Sales Data to create a tab component. We have already made three components - one line chart and two bar charts. Then, make a tab component with three tabs. Drag the already-created charts into the tabs one by one.

Created Tabs

Data Tab

Format Tab

Model Visualizations

We will now look at a few model visualizations that can be created in xVectorLabs:

Regression:

We will create "Sensitivity Analysis" and "What If Scenarios" within the Regression model.

Also, exploring the Regression model via a business case can be found here.

Time Series:

exploring the Time Series model via a business case can be found here.

Regression: Sensitivity Analysis

Sensitivity Analysis

A trained model takes multiple feature values (column values) and produces an output.

Sensitivity analysis enables users to analyze the output of a model by varying values of feature(s).

Users can vary one feature/column over a range and get the list of predicted values.

Sensitivity analysis enables users to vary two features/columns over a range and get a table of predicted values.

A model needs to be deployed first to use sensitivity analysis. This model will be used to get the predictions.

Follow these steps to create a sensitivity analysis component

Click on Add Component (➕), choose Model Components, and then Sensitivity Analysis under it.
Use the Data tab to provide configuration for sensitivity analysis
1. Model - select a deployed model from the available list
2. Primary feature range - Select the feature that needs to be varied and see the predictions. Enter range start and end values with steps. For example, if we want to vary the primary feature from -20% to +60% from its current value, put the range start as 0.8, the range end as 1.6, and the step as 0.2 as a step. This will create a variation range of -20%, 0%, 20%, 40%, 60%.
3. Secondary feature range - Select the second feature that needs to be varied and see the predictions and changes in the primary feature. Enter range start and end values with steps. For example, if we want to vary the primary feature from -20% to +60% from its current value, put the range start as 0.8, the range end as 1.6, and the step as 0.2 as a step. This will create a variation range of -20%, 0%, 20%, 40%, 60%.
4. One can vary the current value of the features from options available in the data tab or it can be varied directly from the report component using a slider.
Use the Format tab to provide formatting options for component
1. Choose the layout option (top, bottom, left, or right) for the edit panel.

Example

We will use the Sales Prediction model for sensitivity analysis. For this, we vary two numerical features (or columns), Fuel Price and CPI, over 80% to 140% to analyze how the sales are impacted. The selected 80% to 140% will vary by steps of 20% when the selected step is 0.2. This shows how sensitive sales are to fuel price or CPI, which are the input features.

One thing to note is that the model needs to be deployed to create the report.

Created sensitivity analysis component

Click on Add Component (➕)

Choose Model Components -> Sensitivity analysis

Data Tab

Format Tab

Regression: What If Scenario

A trained model takes multiple feature values (column values) and produces an output. A user can provide a specific value to each feature/column and get the model output using the What-If Scenario.

A model needs to be deployed first to use sensitivity analysis. This model will be used to get the predictions.

Follow these steps to create a sensitivity analysis component

Click on Add Component (➕) and choose Model Components and then What If Scenario under it.
Use the Data tab to provide configuration for a what-if scenario
1. Model - select a deployed model from the available list
2. Feature values - give values to the given feature values to get the prediction
3. Select image - select image to display in the component. Images will be available only if the chosen model has some pictures of the model output report
4. One can vary the value of the features from options available in the data tab, or it can be varied directly from the report component usinga slider
Use the Format tab to provide formatting options for component
1. Choose the layout option (top, bottom, left, or right) for the edit panel.

Example

In this example, we predict sales price when we change the value of a particular feature like fuel price, holidays, or stores.

The SHAP image shows the importance of the features.

Created What If Scenario component

Click on Add Component (➕)

Choose Model Components -> What If Scenario

Data Tab

Format Tab

Shapes: Rectangle & Circle

Rectangle

This will create a rectangle shape on the sheet.

Follow these steps to create a rectangle shape

Click on Add Component (➕), choose Shapes, and then Rectangle under it.
Use the Format tab to provide formatting options for component
1. Background and Border - formatting and styling option for the background and border

Created Rectangle

Circle

This will create a circle shape on the sheet.

Follow these steps to create a circle shape

Click Add Component (➕), choose Shapes, and then Circle under it.
Use the Format tab to provide formatting options for component
1. Background and Border - formatting and styling option for the background and border

Created Circle

Date Range Filter

Applies date ranges by specific dates, months, or years to selected report components.

Follow these steps to create a date range filter-

Click on Add Component (➕) and choose Filter and then Date Range
Use the Data tab to provide configuration for the date range filter
1. Default date range - Click on the button and select the starting and ending dates.
2. Applicable for - select the components and corresponding date column name for applying date filters
3. Duplicate in sheets - The User can select sheets to apply filters on selected sheets.
Use the Format tab to provide formatting options for the table
1. View as - select from available options (default, month, quarter, year)
2. Configure different formatting options for the component (font, colors, size, etc)

Example

We are creating a date range filter on a line chart component created using Weekly Sales Data. The line chart displays the change in Weekly Sales over Fuel Price. We will apply a date range filter to limit the output to a fixed period.

Created Filter

Data Tab

Format Tab

Entity Filter

Filters data by various entities such as products, channels, and geography. Applying entity filters on report components that use entity-type datasets is recommended.

Entity filters can be applied in view mode.

Follow these steps to create an entity filter:

Click on Add Component (➕) and choose Filters and then Entity
Use the Data tab to provide configuration for the Entity filter
1. Dataset - select the dataset for which the entity filter needs to be created
2. Follow Dataset Hierarchy - Select if the hierarchy of the dataset needs to be followed.
3. Add columns (category) that represent the hierarchy
4. For each selected column, select the component and corresponding column name
5. Label - provide the label to the entity filter
6. Filter name - give a name to the entity filter component
Use the Format tab to provide formatting options for the table
1. Configure different formatting options for filters (font, colors, etc)
2. Background and Border - formatting and styling option for the background and border

Example

Using the auto insurance churn dataset, we have created a table with columns - state, county, and city (an entity with hierarchy) and selected income and curr_ann_amount as measures. We now want to apply filters to the entities, which are state, county, and city. To apply the entity filter, we created one filter and mapped the entity column to the table component.

Created Entity

Click on Add Component (➕)

Choose Filters and then Entity

Data Tab

Format Tab

Date Aggregator Filter

A date aggregator helps users apply date aggregate functions such as day/week/month/quarter for time-based datasets and view the reports with the appropriate level of granularity.

Follow these steps to create a date aggregator-

Click on Add Component (➕), choose Filters, and then Date Aggregator
Use the Data tab to provide configuration for the Entity filter
1. Applicable for - select the component on which we want to apply the date aggregator
2. Select the option to display - choose from the available options and according to the date dimension in the dataset
Use the Format tab to provide formatting options for the table
1. Layout - choose from horizontal and vertical
2. Configure styling for active and default values.

Example

Using the auto insurance churn dataset, we have created a table with columns - state, county, city, and cust_orig_date- and selected income and curr_ann_amount as measures. We now want to apply date aggregation to filter by Month, Quarter, Year, and Year Month.

Created Date aggregator

Click on Add Component (➕)

Choose Filters and then Date Aggregator

Data tab

Format Tab

Filter Collection

This is to view and manage all applied filter values on the current report sheet.

Follow these steps to create a filter collection-

Click on Add Component (➕), choose Filters, and then select Filter Collection.
Configure different formatting and styling options in the format tab.

Example

Using the auto insurance churn dataset, we have created a table with columns - state, county, city, and cust_orig_date- and selected income and curr_ann_amount as measures. We have applied entity, date-aggregator, and input slider filters in this table. Creating a filter collection will display all the values the filter applies to the table component.

Created filter collection

Format Tab

Sticky

This carries over filters from one sheet to another. Select different sheets under the Data tab in the ‘Duplicate in sheets’ section. This will apply the filters to all selected sheets.

Data tab of filters

Dimension

Dimension allows you to get the value of charts (line/bars) for each category value present in the selected dimension column. It is a way further to group the data by a category of choice

Example

For the bank_marketing_campaign dataset, if one wants to know the total number of individuals with term subscriptions based on job types, one bar chart with the job on the x-axis and y (flag for term deposit subscription) on the y-axis can be created. Furthermore, if one needs to know the distribution of married individuals under each job, ‘marital status’ in the dimension field should be added to visualize the required distribution.

Chart without dimension:

After adding dimension

Variables

A variable enables values populated from a dataset or inputs. Custom expressions can use variables in advanced expressions. A variable is either predefined or tied to an input slider's value. It is derived from a dataset using an expression or is user-defined. Variables are scoped at a report and can be used in any sheet belonging to the report in which a variable was created.

Follow these steps to create a variable-

Click on Add Variables ( X^2) and choose Add New at the bottom of the right pane.
Provide the details
1. Type - choose variable.
2. Name - Enter a name for the variable
3. Default value - provide a default value to the variable
4. Dataset - choose the dataset that will be used to calculate the value of the variable
5. Expression - write an expression for calculating variable value
Validate and Save

To use the created variable, add a custom-type measure to the required chart. Enter the name of the measure and write the expression using that variable. Variable values can be accessed in expression using @<variable_name>

Example-

Here, we will create a variable named ‘max_sales’ which will store the value of maximum Weekly_sales in Weekly Sales Data

Creating a variable-

From a report, click on add variable icon

Enter details for the variable.

Writing expression for computing variable-

Syntax: COMPUTE(column=column_name, filters=[], order='asc', limit=int)

Column_name: column_name from the selected dataset. Aggregate functions can also be applied on columns like - max(column_name), sum(column_name), and avg(column_name). Supported functions - max(), sum(), avg(), min(), count().

Filters: list of filters. Each filter can be a simple expression such as between(), isin(), notin(), and similarly supported conditionals. Filters can be grouped with brackets. Currently, the AND operator is supported with "&". OR operator is supported with "|".

Example-

COMPUTE(column=colName, filters=[colABC<20, ( between(col123, 20, 50) & isin(colXYZ, options) )])

Aggregate functions take a single parameter, either a column or case statement.
Example - avg(case(col_name > 20, col_name, None))

Using the created variable-

A bar chart to display the weekly sales of each store is created. Now, we will add one more measure in the y-axis where the difference between the max of weekly sales and the sum of weekly sales for that particular store will be displayed. For this, we will use custom measure with an expression-

@max_sales-sum(Weekly_Sales).
Here @max_sales is the created variable and Weekly_Sales is a column in the dataset.

In view mode:

Expression

Expressions allow for complex expressions to be authored as a formula. Formulas can use the richness of various functions, enabling more dynamism and richness to the reports.

One can define and use expressions similar to that of variables. The difference is that expressions are executed only when used. Existing variables can be used in other expressions.

Follow these steps to create an expression-

Click on Add Variables ( X^2) and choose Add New
Provide the details
1. Type - select expression.
2. Name - Enter a name for the variable
3. Expression - write the expression using variables. Variables can be accessed by prefixing variable names with @ like @<variable_name>. One can write mathematical expressions with variables.
Validate and Save

Example

For example, we will create a line chart with expressions using Weekly Sales Data. Suppose we want to display the date on the x-axis and on the y-axis; we need normalized sales data. Let’s assume we need to normalize the weekly sales value by using the below operation-

( (sum(Weekly_Sales) - min(sales)) / (max(sales) - min(sales)) ) * 100

For this, we will define two variables - max_sales and min_sales to store the maximum and minimum sales values. Then, we will create an expression to get the scaling factor part-

100 / (max(sales) - min(sales)).

Created Expression

Now, we can define a custom column using this expression to get the normalized sales data.

Created line chart

Data Tab

Context Strings

Context Strings provide the mechanism to use values from other sibling components. This applies only to the Scorecard component.

This can be used to create a scorecard with values from other created scorecards.

Example

Created two scorecards using the ‘auto insurance churn’ dataset. One displays the total estimated income of the individual (income_sc), and the other shows the total amount the customer paid (amount_sc). Now, we will create a third scorecard that shows the difference between the total income and the total amount paid. We will create a scorecard using a context string that takes values from the other two scorecards.

Context string -> income_sc.primary - amount_sc.primary

Data Tab

Breakdown

Breakdown charts visualize how a more extensive dataset or distribution can be segmented into smaller, more manageable charts. This is used to analyze distribution for different categories.

Example

Using bank_marketing_campaign_data, if we want to know the number of individuals subscribing to a term deposit based on job type and need different plots for the required distribution for various education levels, we can choose the ‘education’ field under the breakdown dimension field. This will create plots for each education type available. One can further format this using the format tab.

Format Tab

Drilldown

The drilldown feature allows one to explore the data in greater detail directly from the report component itself. It takes a high-level overview and lets users dive deeper into specific areas of interest. Charts that have drill-down enabled allow for deep dives into various aggregates.

Available drill-down output options-

Drill down to line chart
Drill down to the bar chart
Drill down to the pie chart
Show as table

To use the ‘Show as table’ option, one needs to define the table columns first in the data tab of the report component.

Example

We have created a bar chart using bank_marketing_campaign data to get the number of term deposit subscribers by different job types.

Now, we want to analyze one of the job categories by marital status. We can do this by right-clicking on the respective bar (blue-collar), choosing the drill-down bar chart, and then selecting the ‘marital’ column.

Output

Users can drill down the generated chart further into multiple levels as required. For example, in the above case, we used to drill down on blue-collar(job) to marital status and can further do it at education and then loan level.

Example:

Show as Table Option

First, define the columns in the data tab. Here, we select marital and education.

Drill down as table output.

Component Options

One can view all options for the component by clicking on the menu options ( ፧ ) present at the top right corner of each component.

Delete - deletes the component.
Bring forward - bring the component forward
Send backward - sends the component backward
Bring to front - brings the component in front
Send to back - sends the component to back
Copy - copies the component. It can be pasted on other sheets.
Expand - expands the component
View data - view the dataset used in the report component

Report Options

Presence - Shows which user is on the workspace
Driver Start/Stop - used to start or stop the driver and represented by the play button icon. Start the driver to interact with live data.
Report variables - for defining report variables and expressions
Full-Screen Mode - to view the report in full-screen mode
Embed this report - to embed the report component. Can be done at sheet level also. This is used for sharing and collaboration.
Edit/View - for changing the report component mode. Edit mode allows users to update the report's configuration, while users can only view it in view mode.
Action Logs - to view logs of action taken on the report
Help - opens help document
Comments - to view comments
Delete - for deleting the current sheet
Driver Settings - opens the settings tab for the report driver. A Dedicated driver for the report can be enabled, disabled, and configured here.
Share - for sharing the report with other users or groups