AI has a pretty diverse glossary.
We prepared the following glossary sheet, which lists the 70 most common and must-know terms, technologies, and techniques used in artificial intelligence today. Thus, being aware of them is crucial.
Let’s discuss them in brief one by one:
A
Attention: Mechanism in transformers that helps the model focus on specific parts of the input sequence.
Agents: Autonomous entities that perform tasks in environments, often guided by LLMs for decision-making.
Autoregressive Models: Models like GPT that generate text by predicting the next token based on previous ones.
B
BERT: A pre-trained transformer model that uses bidirectional attention to understand the context from both directions.
Beam Search: A decoding algorithm that explores multiple paths in sequence generation, selecting the best one.
Batch Norm: A technique to normalize activations in a neural network, speeding up training and improving stability.
C
Cross-attention: A mechanism in transformers where attention is applied across different sequences, such as in translation models.
Chain-of-thought: A reasoning process in LLMs that models logical steps between input and output for improved inference.
Causal Language Models: Models that predict the next token in a sequence based only on past tokens (e.g., GPT).
D
Diffusion Models: Generative models that reverse the process of adding noise to data to create new samples.
DALL·E: An OpenAI model that generates images from textual descriptions using LLMs.
DPO (Direct Preference Optimization): A training method where models learn to generate outputs based on user preferences.
E
Embeddings: Dense vector representations of words or tokens that capture semantic meaning for model inputs.
Encoder-Decoder Model: A neural network architecture used for tasks like translation, where input sequences are encoded and then decoded into outputs.
Evaluation Metrics: Metrics like accuracy, F1 score, etc., used to evaluate the performance of models.
F
Fine-tuning: Adapting pre-trained models to a specific task by training on a smaller, task-specific dataset.
Few-shot Learning: A learning paradigm where models generalize well from only a few examples.
Flash Attention: An efficient attention mechanism that speeds up processing in transformers by reducing memory usage.
G
GPTs (Generative Pretrained Transformers): Language models pre-trained to generate human-like text from prompts.
GANs (Generative Adversarial Networks): A class of models where two networks (generator and discriminator) compete to create realistic data.
Graph RAG: Retrieval-augmented generation (RAG) systems that leverage graphs to retrieve relevant information for generation tasks.
H
Hallucinations: Instances where an LLM generates factually incorrect or nonsensical information.
Hugging Face: A popular platform and community providing pre-trained LLMs and NLP tools.
Human-in-the-loop: A system design where human feedback is integrated to guide model behavior or improve performance.
I
Inference: Making predictions or generating outputs using a trained LLM.
Instruction Tuning: Fine-tuning LLMs to follow specific instructions or perform various tasks based on prompts.
Imitation Learning: A learning method where models learn to mimic expert behavior based on example demonstrations.
J
Joint Embedding Space: A shared representation space where multiple data modalities (e.g., text and images) are embedded.
K
Knowledge Graphs: Graph-based structures that represent relationships between concepts, useful in LLM-powered search and reasoning.
Knowledge Distillation: A technique where a smaller model learns from a larger model, transferring knowledge efficiently.
kNN (k-Nearest Neighbors): A simple, instance-based learning algorithm often used for classification tasks.
L
LangChain: A framework for building applications powered by language models, facilitating easier chaining of LLM prompts.
LoRA (Low-Rank Adaptation): A technique to fine-tune LLMs efficiently by reducing the number of trainable parameters.
LLaMA (Large Language Model Meta AI): A family of powerful open-source LLMs developed by Meta.
M
Multimodal Learning: A method where LLMs learn from and generate across multiple data types (e.g., text, images).
Mixture of Experts: A model architecture that selects different neural network “experts” based on input data.
Multihead Attention: A mechanism in transformers that allows the model to focus on different parts of input simultaneously.
N
Normalization: Techniques like LayerNorm that standardize inputs to stabilize and speed up model training.
Next-token Prediction: The task of predicting the next word or token in a sequence, used in autoregressive models like GPT.
O
One-shot Learning: A learning paradigm where a model learns from a single example to generalize to unseen data.
OpenAI o1 Model: Refers to one of the foundational models developed by OpenAI for general-purpose language understanding.
P
Prompt Engineering: The process of designing prompts to guide LLMs to generate the desired outputs.
PEFT (Parameter Efficient Fine-Tuning): A technique that fine-tunes models while modifying only a few parameters to save computation.
Pre-training: The initial phase of training LLMs on large datasets before fine-tuning them for specific tasks.
Q
Quantization: A method to reduce the size of LLMs by lowering the precision of the weights, making models faster and lighter.
Qdrant: A vector search database used to store and search through vector embeddings, often used with LLMs.
QLoRA: A fine-tuning method combining quantization and LoRA for efficient adaptation of LLMs.
R
RAG (Retrieval-Augmented Generation): Models that retrieve relevant information from external sources before generating a response.
RLHF (Reinforcement Learning with Human Feedback): A method of training LLMs to align better with human preferences using reinforcement learning.
Residual Connections: A mechanism in neural networks that helps gradient flow and improves model training stability.
S
Self-supervised Learning: A learning method where models generate their own labels from unlabeled data, often used in LLM pre-training.
Sparse Attention: An attention mechanism that focuses on a subset of tokens, making transformers more scalable.
Similarity Search: The process of finding data points that are similar to a given query, often done using vector embeddings.
T
Transformers: The foundational architecture for LLMs, relying on self-attention mechanisms to process sequences.
Temperature: A hyperparameter that controls the randomness of predictions made by LLMs, affecting diversity in generated text.
Tokenization: The process of breaking text into smaller units (tokens) for model input in LLMs.
U
Universal Sentence Encoder: A model that generates fixed-length embeddings for sentences, widely used for similarity and search tasks.
U-Net: A convolutional neural network architecture, originally for image segmentation, but adaptable for some LLMs.
Unstructured Data: Data like text, images, and audio that LLMs process without predefined formats.
V
Vision Transformer (ViT): A transformer model adapted for image data, showing that LLM techniques work beyond text.
Vector Databases: Databases optimized for storing and searching through vector embeddings used by LLMs.
vLLM: A high-throughput serving engine that accelerates large-scale deployment of LLMs.
W
Word Embeddings: Representations of words as dense vectors, allowing LLMs to understand the semantic relationships between words.
Weaviate: A vector database designed for semantic search, often used with LLMs to retrieve relevant documents.
X
XGBoost: An efficient and scalable gradient-boosting algorithm, commonly used in machine learning tasks.
XLNet: A transformer-based model that improves on BERT by learning from both past and future context in sequences.
Y
YOLO (You Only Look Once): A fast object detection model, adapted in multimodal systems involving LLMs.
YaFSDP: An optimized data parallelism library that is an enhanced version of FSDP (a framework in PyTorch) with additional optimizations, especially for LLMs, developed by Yandex. The following table compares the results of YaFSDP with current techniques:
Z
Zero-shot Learning: A method where models make predictions for tasks they haven’t been explicitly trained on.
ZeRO (Zero Redundancy Optimizer): An optimization technique that reduces memory usage when training large LLMs.
That’s a wrap!
With these 70 terms, we intended to encapsulate the breadth of AI and LLM technologies. Staying familiar with these core ideas is essential not only for developers and researchers but also for anyone looking to understand how AI is shaping the future.
But, of course, a lot has been left out here. Can you add more terms to this?
Thanks for reading!