2025-10-25

Notes: AI

Collection of my notes on AI

Terms

Autoregressive LLM: a large language model that generates text one token at a time, predicting each new token based on the sequence of preceding ones
Deep learning: deep learning focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning.
GPT: Generative Pre-trained Transformer. It refers to a type of AI model that is first pre-trained on a massive dataset, then fine-tuned to generate human-like text in response to prompts.
Inference: is the process where a trained machine learning model uses its learned knowledge to analyze new, unseen data and generate an output, such as a prediction, classification, or decision
LLM: large language model
Multimodal: is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images, or video.
Neural network: a computational model inspired by the structure and functions of biological neural networks.
Pretraining: process where training data is converted into tokens and fed into the neural network and the correct sequences are given a nudge to higher probability
Retrieval-Augmented Generation (RAG): is an AI technique that combines a large language model (LLM) with an external knowledge base to provide more accurate, up-to-date, and contextually relevant responses. RAG systems retrieve information from a specified source, like a company’s internal documents or the latest news, before using the LLM to generate an answer based on that retrieved data instead of solely relying on its training data.
Transformer: text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.
Transformer models: a type of neural network architecture foundational to modern AI, excelling at understanding context and relationships in sequential data like text. They use a mechanism called attention to weigh the importance of different parts of the input sequence, allowing for parallel processing and significantly faster training compared to older methods. This versatility and efficiency have enabled breakthroughs in natural language processing (NLP) with models like ChatGPT, as well as advancements in computer vision and other fields.

People

Andrej Karpathy:
- Founding member of OpenAi (2015 - 2017, 2023 - 2024)
- Former director of AI at Tesla (2017 - 2022)
- Currently working on utilizing AI as an educational tutor:
  - Eureka Labs
  - LLM101n: Let’s build a Storyteller
- Dwarkesh Patel interview: Andrej Karpathy — AGI is still a decade away
- Deep Dive into LLMs like ChatGPT with Andrej Karpathy
Demis Hassabis:
- CEO and co-founder of DeepMind (acquired by Google in 2014)
- While waiting to be old enough to get into Cambridge University worked for Bullfrog Productions on Syndicate level designs and co-designed and programmed Theme Park
- Lex Fridman interview: *#475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games
Richard Sutton:
- Considered to be one of the founders of modern computational reinforcement learning
- Joined DeepMind in 2017
- Published the essay The Bitter lesson that argues that in the long run simpler AI systems that scale with compute power will outperform more complex systems that integrate domain specific human knowledge because they will take better advantage of Moore’s Law: “We have to learn the bitter lesson that building in how we think we think does not work in the long run.”
- Dwarkesh Patel interview: Richard Sutton – Father of RL thinks LLMs are a dead end
Yann Lecun:
- Chief AI Scientist at Meta
- Previously worked at Bell Labs in the Adaptive Systems Research Department
- Lex Fridman interview: #416 – Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI

Interesting things

Last updated: 25/10/2025

Thanks for reading and feel free to give feedback or comments via email (andrew@jupiterstation.net).