Top AI Engineer Interview Q&A: Zero-Shot, One-Shot and Few-Shot Learning Explained


In almost every AI/ML Engineer interview I have seen, the interviewer asks one common question 

    "Can you explain the difference between Zero-Shot, One-Shot and Few-Shot learning with an example?"

Most candidates know the definition but fail to justify it with a real example or code. This post will help you answer it confidently — from the straight one-liner answer, to the analogy, to working Databricks code you can run yourself.

Quick Overview

Approach Examples Needed Accuracy Setup Time Best For
Zero-Shot0ModerateImmediateUnknown / dynamic categories
One-Shot1 per classGoodMinutesRare or hard-to-collect data
Few-Shot5–20 per classHighMinutesSmall labeled dataset available
Fine-Tuning1000+ per classHighestDays/WeeksProduction-grade accuracy

Interview Question 1

    "What is Zero-Shot Learning and when would you use it?"

Answer to the Interviewer

Say exactly this "Zero-Shot Learning is when a model classifies inputs into categories it has never seen during training. It uses its pre-trained language understanding to infer labels purely from their names or descriptions no labeled examples are needed at all."

Explanation with Justification:

Imagine you join a new company on Day 1. Your manager asks you to sort customer emails into folders - Billing, Technical Support and Sales without any training or knowledge sharing. Since, you are experienced, you read each email and understood the context and classified into folders.

Likewise, the model uses its existing knowledge or pre-trained knowledge to classify without ever seeing an example of those categories. That is exactly what Zero-Shot does. 

How it works internally: Zero-Shot uses Natural Language Inference (NLI). It converts classification into an entailment problem - scoring whether the text entails each candidate label, and picking the highest score.

When to use Zero-Shot:

  • You have zero labeled data
  • Categories are new or dynamic — like news topics that change weekly
  • You need a quick working prototype without data labeling effort

Coding Exercise - Try in Databricks/VS Code

from transformers import pipeline

# Load zero-shot classification pipeline
classifier = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli"
)

# Input sentence
text = "Databricks released a new feature for real-time ML model serving"

# Categories the model has NEVER been trained on
candidate_labels = ["technology", "sports", "finance", "health"]

# Classify
result = classifier(text, candidate_labels)

# Output
print("Predicted Label :", result["labels"][0])
print("Confidence Score:", round(result["scores"][0], 4))
▶ Output
Predicted Label : technology Confidence Score: 0.9871
🧪 Try It Yourself Change text to "The stock market crashed due to rising inflation" and observe how the label automatically switches to finance - without any retraining whatsoever.

Interview Question 2:

        "What is One-Shot Learning? How is it different from Zero-Shot?"

Answer to the Interviewer

Say exactly this "One-Shot Learning is when a model learns to classify from exactly one labeled example per class. Instead of retraining, it converts that example into an embedding vector and finds the most similar match for new inputs using cosine similarity."

Explanation with Justification:

Real Life example, Show a child one photo of a mango and they can correctly identify mangoes in any future photo; even different shapes and colors. They do not need 1000 photos. One clear example is enough.

That is One-Shot. The model uses that single example as a reference point and compares every new input against it using embedding similarity.

Coding Exercise - Try in Databricks/VS Code

from sentence_transformers import SentenceTransformer
import numpy as np

# Load embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

# ONE example per class — your entire training set
one_shot_examples = {
    "technology" : "Apache Spark is used for big data processing",
    "sports"     : "The football team scored a goal in the final",
    "finance"    : "Stock market saw a sharp decline yesterday",
    "health"     : "The new vaccine reduces fever effectively"
}

# Encode all examples into vectors
labels     = list(one_shot_examples.keys())
embeddings = model.encode(list(one_shot_examples.values()))

# New text to classify
new_text      = "Databricks launched a new AI model serving platform"
new_embedding = model.encode([new_text])[0]

# Find most similar example using cosine similarity
similarities = []
for emb in embeddings:
    score = np.dot(new_embedding, emb) / (
        np.linalg.norm(new_embedding) * np.linalg.norm(emb)
    )
    similarities.append(score)

predicted_label = labels[np.argmax(similarities)]
print("Predicted Label :", predicted_label)
print("Similarity Score:", round(max(similarities), 4))
▶ Output
Predicted Label : technology Similarity Score: 0.8764
🧪 Try It Yourself Change new_text to "Federal Reserve increased interest rates by 0.5 percent" and watch it predict finance from just one example, no model retraining needed.

Interview Question 3:

    "What is Few-Shot Learning? When would you prefer it over One-Shot?"

Answer to the Interviewer

Say exactly this "Few-Shot Learning is when a model learns from a small number of labeled examples, typically 5 to 20 per class. It builds a prototype vector by averaging the embeddings of all examples per class, giving a more robust and accurate representation than One-Shot."

Explanation with Justification

Real Life example, A doctor studying a rare disease is shown 10 patient X-rays with confirmed diagnosis. When the 11th patient arrives, the doctor can diagnose confidently, because they have seen enough variety to recognize the pattern.

                            More examples = better pattern = higher accuracy. 
That is the key advantage of Few-Shot over One-Shot.

Coding Exercise — Try in Databricks/VS code

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

# 5 examples per class — few-shot training set
few_shot_examples = {
    "technology": [
        "Apache Spark processes big data at scale",
        "Python is widely used in machine learning",
        "Databricks provides a unified analytics platform",
        "Deep learning models require GPU for training",
        "Kubernetes manages containerized workloads"
    ],
    "sports": [
        "The cricket team won the world cup finals",
        "Real Madrid signed a new striker this season",
        "Olympics 2024 broke multiple world records",
        "The tennis player won the Grand Slam title",
        "Football league season kicks off next week"
    ],
    "finance": [
        "Stock market fell sharply due to inflation",
        "Federal Reserve increased interest rates again",
        "Bitcoin crossed fifty thousand dollars mark",
        "Gold prices hit an all-time high today",
        "Quarterly earnings beat analyst expectations"
    ],
    "health": [
        "New vaccine reduces dengue fever risk significantly",
        "Doctors recommend daily exercise for heart health",
        "Cancer research shows major treatment breakthrough",
        "Mental health awareness is growing worldwide",
        "WHO approved a new drug for malaria treatment"
    ]
}

# Build ONE prototype vector per class by averaging all examples
labels     = []
prototypes = []

for label, examples in few_shot_examples.items():
    embeddings = model.encode(examples)
    prototype  = np.mean(embeddings, axis=0)  # Average = prototype
    labels.append(label)
    prototypes.append(prototype)

prototypes = np.array(prototypes)

# New text to classify
new_text      = "Databricks releases new AutoML feature for data scientists"
new_embedding = model.encode([new_text])[0]

# Cosine similarity vs each prototype
similarities = []
for proto in prototypes:
    score = np.dot(new_embedding, proto) / (
        np.linalg.norm(new_embedding) * np.linalg.norm(proto)
    )
    similarities.append(score)

predicted_label = labels[np.argmax(similarities)]
print("Predicted Label :", predicted_label)
print("Similarity Score:", round(max(similarities), 4))
▶ Output
Predicted Label : technology Similarity Score: 0.9123
🧪 Try It Yourself Change new_text to "Patient diagnosed with rare autoimmune disorder" and observe it correctly predicts health - with higher confidence than One-Shot would give, because the prototype vector covers more variation.

Interview Question 4 — Bonus (Very Commonly Asked):

         "How would you decide which approach to use in a real project?"

Answer to the Interviewer

Say exactly this "I would base the decision on how much labeled data is available, how quickly I need results, and what accuracy is acceptable for the use case."

Decision Framework — Draw This on the Whiteboard

Start Here → Do you have labeled data? │ ├── NO │ └──→ Zero-Shot │ (no examples needed, uses NLI) │ └── YES │ ├── 1 example per class │ └──→ One-Shot │ (embedding similarity) │ ├── 5 to 20 examples per class │ └──→ Few-Shot │ (prototype vectors) │ └── 1000+ examples per class └──→ Fine-Tuning (full model training)

Real Project Story:

Real World Example "In a customer support automation project, we received thousands of tickets daily across 8 categories. We had zero labeled data initially, so we started with Zero-Shot to get a working system on Day 1. Over two weeks, the team labeled 10 examples per category, so we switched to Few-Shot, which improved accuracy from 74% to 91%. Eventually after labeling 2000 examples, we fine-tuned the model and reached 97% accuracy."

This answer shows your understanding of the concepts and the full journey from no data to production - which is exactly what senior interviewers want to hear.

Final Summary - Notes to Remember

Zero-Shot One-Shot Few-Shot Fine-Tuning
Data Needed01 per class5–20 per class1000+ per class
Training RequiredNoNoNoYes
AccuracyModerateGoodHighHighest
Setup TimeMinutesMinutesMinutesDays / Weeks
How it WorksNLI entailmentCosine similarityPrototype vectorsGradient descent
Model Typebart-large-mnliMiniLM embeddingsMiniLM embeddingsTask-specific

Happy Learning!!!

Post a Comment

0 Comments