Skip to content

Models

Models allow users to find patterns in data and make predictions. Given the sensitivity of algorithms to data distribution — which can change over time and lead to performance deterioration — xVector provides a structured approach to experimentation and model management.

In xVector, building models begins with drivers — powerful libraries such as Scikit-learn and XGBoost that power the algorithms. Once a driver is set up, the next step is to create a model, a framework for experimentation. Each model becomes a hub of exploration where experiments are authored, and experiments are populated with multiple runs, each a unique attempt to capture and compare parameters.

Once the user picks a model that fits the data best, the model can be deployed to make predictions. Models in production are then continuously monitored for performance. Anomalous behaviors are quickly identified and notified for further action.

A user can create multiple experiments under a model. Each experiment includes one or more runs. Under each experiment, various parameters with available drivers can be tried on different datasets. On updating any input parameters and triggering “re-train”, a new run under that experiment gets created. Different runs under an experiment can be compared using selected performance metrics.

Experiments can create multiple runs with different input parameters and performance metrics as output. Based on the metric, one can be chosen for the final model. This enables users to experiment with different drivers, datasets, and parameters to achieve the expected performance metric before deploying.

Runs under experiments are powered by underlying libraries and algorithms defined in drivers. The platform provides a comprehensive set of model drivers for business analysts and advanced users. In addition, data scientists can author custom drivers.

Users can author custom drivers for different model types. If requirements don’t fit any defined type, users can choose the ‘app’ type.

  1. From a workspace, click on Add and choose Models.
  2. Choose Author Estimator from the available list.
  3. Enter details: Name, Type, base driver (optional), whether it’s pretrained, and scope.

A Jupyter server will be launched where users can author drivers. The notebook contains five files: train.ipynb, predict.ipynb, xvector_estimator_utils.py, config.json, and requirements.txt. Write the algorithm in train.ipynb and modify predict.ipynb for predictions. Training parameters are defined in config.json.

Notebook options under the xVector menu: Select dataset (updates config.json with metadata), Register (makes the driver available for use in models), and Shutdown (stops the Jupyter server).

For all model types, the general workflow is:

  1. From a workspace, click on Add and choose Models.
Workspace Add menu showing Models option
  1. Choose the model type from the available options.
Model type selection showing all available model types
  1. Configure the model following the type-specific steps described below.

Regression is a supervised learning technique for uncovering relationships between features (independent variables) and a continuous outcome (dependent variable). The goal is to use these relationships to predict the outcome of new data.

Configure steps:

  1. Model — Provide a name for the model
  2. Configure — Experiment name, workspace, select dataset, choose a driver, select the Predictor Column (target column for prediction)
  3. Features — Select all columns used for the regression
Regression model configuration with predictor column selection Regression model feature selection

Experiment view — Shows the experiment with runs and performance metrics:

Regression experiment view showing runs

Run results — Model output with coefficients and metrics:

Regression run results and metrics

Predictions — Apply the trained model to generate predictions:

Regression model predictions output

Classification is a supervised learning technique that categorizes data into predefined classes based on features or attributes using labeled training data.

Configure steps:

  1. Model — Provide a name for the model
  2. Configure — Experiment name, workspace, select dataset, choose a driver, select the Predictor Column (the column containing the label or class)
  3. Features — Select all columns used for classification
Classification model configuration Classification model feature selection Classification experiment view Classification run results and metrics Classification model predictions

Clustering is an unsupervised learning technique that groups data points based on similarity of attributes. There is no predictor column — the algorithm discovers natural groupings.

Configure steps:

  1. Model — Provide a name for the model
  2. Configure — Experiment name, workspace, select dataset, choose a driver
  3. Features — Select all columns used for clustering
Clustering model configuration Clustering model feature selection Clustering experiment view Clustering run results Clustering model output showing segments

Time series analysis allows businesses to forecast time-dependent variables such as sales, helping manage finance and supply chain functions.

Configure steps:

  1. Model — Provide a name for the model
  2. Configure — Experiment name, workspace, select dataset, choose a driver, select the Date Column and the Forecast Column
  3. Features — Select relevant features (optional depending on driver)
Timeseries model configuration with date and forecast columns Timeseries model feature selection Timeseries experiment view Timeseries run results with forecast output

Sentiment Analysis identifies and classifies the emotional tone of text data. It’s a subfield of NLP used to understand the attitude, opinion, or general feeling expressed in text.

Configure steps:

  1. Model — Provide a name for the model
  2. Configure — Experiment name, workspace, select dataset, choose a driver, select the Text Data Column
  3. Features — This step can be skipped for sentiment analysis drivers (especially pre-trained ones)
Sentiment analysis model configuration with text column selection Sentiment analysis feature selection Sentiment analysis experiment view Sentiment analysis run results

Entity Recognition (NER) is a subfield of NLP that identifies and classifies essential elements in text data such as names, locations, organizations, dates, and other entities.

Configure steps:

  1. Model — Provide a name for the model
  2. Configure — Experiment name, workspace, select dataset, choose a driver, select the Text Data Column
  3. Features — This step can be skipped for entity recognition drivers
Entity recognition model configuration Entity recognition feature selection Entity recognition experiment view Entity recognition run results

Topic modeling helps analyze extensive collections of text data to discover hidden thematic patterns. The algorithm scans text data, identifying clusters of frequently co-occurring words and grouping them into coherent topics.

Configure steps:

  1. Model — Provide a name for the model
  2. Configure — Experiment name, workspace, select dataset, choose a driver, select the Text Data Column
  3. Features — Select relevant columns
Topic modeling configuration with text column selection Topic modeling feature selection Topic modeling experiment view Topic modeling run results showing discovered topics

The app type allows users to use a custom model that doesn’t fall into any defined type. Users can define a driver that uses a combination of different algorithms.

Configure steps:

  1. Model — Provide a name for the model
  2. Configure — Experiment name, workspace, select dataset, choose a driver, select the Predictor Column
  3. Features — Select all columns used by the custom driver

One can view a list of all experiments on the experiment view page. Options include: comments, timeline (action history), and adding new experiments.

Experiment view page showing list of experiments

The run view page lists all runs for an experiment. Available options:

  • Comments — Comment on runs
  • Timeline — View action history
  • Add a new run — Click the (+) icon
  • View — View the model output report
  • Drifts — Display drift report of input data
  • Build + Deploy — Build and deploy the model run, creating a prediction endpoint
  • Delete — Delete the run
  • Predict — Test the prediction endpoint with sample data
  • Copy URL — Copy the model prediction URL
  • Shutdown — Shut down deployed model
  • Token — Authentication token for the model predict API
Run view page showing list of runs with options

View — Model output report:

Run detail view showing model output

Drifts — Data drift analysis:

Run drifts report showing data distribution changes

Performance — Model performance metrics:

Run performance metrics view

To update model details, open the settings pane by selecting Settings from the menu options of model cards in the workspace. Modify the data and click Save.

Model settings accessed from workspace card menu Model settings pane for updating details

A model can be deleted by clicking on Delete in the menu options of model cards in the workspace.

Model card menu showing the Delete option