Notes: AI

Collection of my notes on AI

Terms

  • Autoregressive LLM: a large language model that generates text one token at a time, predicting each new token based on the sequence of preceding ones
  • Deep learning: deep learning focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning.
  • GPT: Generative Pre-trained Transformer. It refers to a type of AI model that is first pre-trained on a massive dataset, then fine-tuned to generate human-like text in response to prompts.
  • Inference: is the process where a trained machine learning model uses its learned knowledge to analyze new, unseen data and generate an output, such as a prediction, classification, or decision
  • LLM: large language model
  • Multimodal: is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images, or video.
  • Neural network: a computational model inspired by the structure and functions of biological neural networks.
  • Pretraining: process where training data is converted into tokens and fed into the neural network and the correct sequences are given a nudge to higher probability
  • Retrieval-Augmented Generation (RAG): is an AI technique that combines a large language model (LLM) with an external knowledge base to provide more accurate, up-to-date, and contextually relevant responses. RAG systems retrieve information from a specified source, like a company’s internal documents or the latest news, before using the LLM to generate an answer based on that retrieved data instead of solely relying on its training data.
  • Transformer: text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.
  • Transformer models: a type of neural network architecture foundational to modern AI, excelling at understanding context and relationships in sequential data like text. They use a mechanism called attention to weigh the importance of different parts of the input sequence, allowing for parallel processing and significantly faster training compared to older methods. This versatility and efficiency have enabled breakthroughs in natural language processing (NLP) with models like ChatGPT, as well as advancements in computer vision and other fields.

People

Interesting things

Last updated: 25/10/2025

Thanks for reading and feel free to give feedback or comments via email (andrew@jupiterstation.net).