Building a Conformal Chatbot in Julia | by Patrick Altmeyer | Jul, 2023

Building a Conformal Chatbot in Julia | by Patrick Altmeyer | Jul, 2023

[ad_1]

Conformal Prediction, LLMs and HuggingFace — Part 1

Patrick Altmeyer
Towards Data Science

Large Language Models (LLM) are all the buzz right now. They are used for a variety of tasks, including text classification, question answering, and text generation. In this tutorial, we will show how to conformalize a transformer language model for text classification using ConformalPrediction.jl.

In particular, we are interested in the task of intent classification as illustrated in the sketch below. Firstly, we feed a customer query into an LLM to generate embeddings. Next, we train a classifier to match these embeddings to possible intents. Of course, for this supervised learning problem we need training data consisting of inputs — queries — and outputs — labels indicating the true intent. Finally, we apply Conformal Predition to quantify the predictive uncertainty of our classifier.

Conformal Prediction (CP) is a rapidly emerging methodology for Predictive Uncertainty Quantification. If you’re unfamiliar with CP, you may want to first check out my 3-part introductory series on the topic starting with this post.

High-level overview of a conformalized intent classifier. Image by author.

We will use the Banking77 dataset (Casanueva et al., 2020), which consists of 13,083 queries from 77 intents related to banking. On the model side, we will use the DistilRoBERTa model, which is a distilled version of RoBERTa (Liu et al., 2019) fine-tuned on the Banking77 dataset.

The model can be loaded from HF straight into our running Julia session using the Transformers.jl package.

This package makes working with HF models remarkably easy in Julia. Kudos to the devs! 🙏

Below we load the tokenizer tkr and the model mod. The tokenizer is used to convert the text into a sequence of integers, which is then fed into the model. The model outputs a hidden state, which is then fed into a classifier to get the logits for each class. Finally, the logits are then passed through a softmax function to get the corresponding predicted probabilities. Below we run a few queries through the model to see how it performs.

# Load model from HF 🤗:
tkr =…
[ad_2]
Source link

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *