Models

Models allow users to find patterns in data and make predictions. Given the sensitivity of algorithms to data distribution — which can change over time and lead to performance deterioration — xVector provides a structured approach to experimentation and model management.

In xVector, building models begins with drivers — powerful libraries such as Scikit-learn and XGBoost that power the algorithms. Once a driver is set up, the next step is to create a model, a framework for experimentation. Each model becomes a hub of exploration where experiments are authored, and experiments are populated with multiple runs, each a unique attempt to capture and compare parameters.

Once the user picks a model that fits the data best, the model can be deployed to make predictions. Models in production are then continuously monitored for performance. Anomalous behaviors are quickly identified and notified for further action.

Key Concepts

Experiments

A user can create multiple experiments under a model. Each experiment includes one or more runs. Under each experiment, various parameters with available drivers can be tried on different datasets. On updating any input parameters and triggering “re-train”, a new run under that experiment gets created. Different runs under an experiment can be compared using selected performance metrics.

Runs

Experiments can create multiple runs with different input parameters and performance metrics as output. Based on the metric, one can be chosen for the final model. This enables users to experiment with different drivers, datasets, and parameters to achieve the expected performance metric before deploying.

Drivers

Runs under experiments are powered by underlying libraries and algorithms defined in drivers. The platform provides a comprehensive set of model drivers for business analysts and advanced users. In addition, data scientists can author custom drivers.

Authoring Custom Drivers

Users can author custom drivers for different model types. If requirements don’t fit any defined type, users can choose the ‘app’ type.

From a workspace, click on Add and choose Models.
Choose Author Estimator from the available list.
Enter details: Name, Type, base driver (optional), whether it’s pretrained, and scope.

A Jupyter server will be launched where users can author drivers. The notebook contains five files: train.ipynb, predict.ipynb, xvector_estimator_utils.py, config.json, and requirements.txt. Write the algorithm in train.ipynb and modify predict.ipynb for predictions. Training parameters are defined in config.json.

Notebook options under the xVector menu: Select dataset (updates config.json with metadata), Register (makes the driver available for use in models), and Shutdown (stops the Jupyter server).

Creating a Model

For all model types, the general workflow is:

From a workspace, click on Add and choose Models.

Workspace Add menu showing Models option

Choose the model type from the available options.

Model type selection showing all available model types

Configure the model following the type-specific steps described below.

Regression

Regression is a supervised learning technique for uncovering relationships between features (independent variables) and a continuous outcome (dependent variable). The goal is to use these relationships to predict the outcome of new data.

Configure steps:

Model — Provide a name for the model
Configure — Experiment name, workspace, select dataset, choose a driver, select the Predictor Column (target column for prediction)
Features — Select all columns used for the regression

Regression model configuration with predictor column selection

Experiment view — Shows the experiment with runs and performance metrics:

Run results — Model output with coefficients and metrics:

Predictions — Apply the trained model to generate predictions:

Classification

Classification is a supervised learning technique that categorizes data into predefined classes based on features or attributes using labeled training data.

Configure steps:

Model — Provide a name for the model
Configure — Experiment name, workspace, select dataset, choose a driver, select the Predictor Column (the column containing the label or class)
Features — Select all columns used for classification

Clustering

Clustering is an unsupervised learning technique that groups data points based on similarity of attributes. There is no predictor column — the algorithm discovers natural groupings.

Configure steps:

Model — Provide a name for the model
Configure — Experiment name, workspace, select dataset, choose a driver
Features — Select all columns used for clustering

Clustering model output showing segments

Timeseries

Time series analysis allows businesses to forecast time-dependent variables such as sales, helping manage finance and supply chain functions.

Configure steps:

Model — Provide a name for the model
Configure — Experiment name, workspace, select dataset, choose a driver, select the Date Column and the Forecast Column
Features — Select relevant features (optional depending on driver)

Timeseries model configuration with date and forecast columns

Timeseries run results with forecast output

Sentiment Analysis

Sentiment Analysis identifies and classifies the emotional tone of text data. It’s a subfield of NLP used to understand the attitude, opinion, or general feeling expressed in text.

Configure steps:

Model — Provide a name for the model
Configure — Experiment name, workspace, select dataset, choose a driver, select the Text Data Column
Features — This step can be skipped for sentiment analysis drivers (especially pre-trained ones)

Sentiment analysis model configuration with text column selection

Entity Recognition

Entity Recognition (NER) is a subfield of NLP that identifies and classifies essential elements in text data such as names, locations, organizations, dates, and other entities.

Configure steps:

Model — Provide a name for the model
Configure — Experiment name, workspace, select dataset, choose a driver, select the Text Data Column
Features — This step can be skipped for entity recognition drivers

Topic Modeling

Topic modeling helps analyze extensive collections of text data to discover hidden thematic patterns. The algorithm scans text data, identifying clusters of frequently co-occurring words and grouping them into coherent topics.

Configure steps:

Model — Provide a name for the model
Configure — Experiment name, workspace, select dataset, choose a driver, select the Text Data Column
Features — Select relevant columns

Topic modeling configuration with text column selection

Topic modeling run results showing discovered topics

App (Custom Model Driver)

The app type allows users to use a custom model that doesn’t fall into any defined type. Users can define a driver that uses a combination of different algorithms.

Configure steps:

Model — Provide a name for the model
Configure — Experiment name, workspace, select dataset, choose a driver, select the Predictor Column
Features — Select all columns used by the custom driver

Experiment and Run Management

Experiment View

One can view a list of all experiments on the experiment view page. Options include: comments, timeline (action history), and adding new experiments.

Run View

The run view page lists all runs for an experiment. Available options:

Comments — Comment on runs
Timeline — View action history
Add a new run — Click the (+) icon
View — View the model output report
Drifts — Display drift report of input data
Build + Deploy — Build and deploy the model run, creating a prediction endpoint
Delete — Delete the run
Predict — Test the prediction endpoint with sample data
Copy URL — Copy the model prediction URL
Shutdown — Shut down deployed model
Token — Authentication token for the model predict API

Run view page showing list of runs with options

View — Model output report:

Drifts — Data drift analysis:

Run drifts report showing data distribution changes

Performance — Model performance metrics:

Updating a Model

To update model details, open the settings pane by selecting Settings from the menu options of model cards in the workspace. Modify the data and click Save.

Model settings accessed from workspace card menu

Model settings pane for updating details

Deleting a Model

A model can be deleted by clicking on Delete in the menu options of model cards in the workspace.

Models

Key Concepts

Experiments

Runs

Drivers

Authoring Custom Drivers

Creating a Model

Regression

Classification

Clustering

Timeseries

Sentiment Analysis

Entity Recognition

Topic Modeling

App (Custom Model Driver)

Experiment and Run Management

Experiment View

Run View

Updating a Model

Deleting a Model

Request A Demo

Thank you!