Comprehensive reference of technical terms, algorithms, and concepts in vector databases and climate modelling. Search by term or filter by category.
41 terms found
Search algorithm that finds k vectors approximately closest to a query vector in high-dimensional space, sacrificing perfect accuracy for computational efficiency. Achieves sub-linear query complexity O(log n) compared to O(n) for exhaustive search.
Related Terms:
Coupled Model Intercomparison Project Phase 6. International framework coordinating climate model experiments from 50+ modelling centres worldwide. Provides multi-model ensemble projections (2015-2100) under Shared Socioeconomic Pathway (SSP) scenarios for IPCC assessment reports.
Related Terms:
Self-supervised training paradigm that learns representations by maximising similarity between positive pairs (augmented views of same instance) whilst minimising similarity to negative pairs (different instances). Common frameworks include SimCLR, MoCo, and CLIP.
Related Terms:
Measure of similarity between two non-zero vectors based on the cosine of the angle between them. Scale-invariant and bounded in [-1, 1], with 1 indicating identical direction. Preferred for normalised embeddings.
Related Terms:
Density-Based Spatial Clustering of Applications with Noise. Clustering algorithm that groups points with many nearby neighbours whilst marking low-density points as outliers. Requires two parameters: ε (neighbourhood radius) and min_samples (minimum cluster size).
Related Terms:
Process of deriving high-resolution climate information from coarse-resolution global climate model (GCM) output. Dynamical downscaling uses regional climate models (RCMs); statistical downscaling uses empirical relationships between large-scale and local variables.
Related Terms:
Dense vector representation of data in continuous space ℝᵈ that preserves semantic similarity. Generated by encoder networks f_θ: X → ℝᵈ trained to map raw inputs (images, text, time-series) to fixed-dimension vectors where similar inputs have small distances.
Related Terms:
Neural network component that transforms high-dimensional input data into lower-dimensional embedding vectors. Common architectures include convolutional neural networks (CNNs) for images, recurrent networks (RNNs/LSTMs) for sequences, and transformers for both.
Related Terms:
Collection of multiple climate model simulations used to characterise uncertainty. Single-model ensembles vary initial conditions; multi-model ensembles (MMEs) combine outputs from different models. Ensemble spread quantifies projection uncertainty.
Related Terms:
Straight-line distance between two points in Euclidean space. For vectors v₁, v₂ ∈ ℝᵈ, computed as square root of sum of squared component differences. Sensitive to scale and magnitude; grows with √d in high dimensions.
Related Terms:
Facebook AI Similarity Search. Open-source library by Meta for efficient similarity search and clustering of dense vectors. Supports multiple index types (Flat, IVF, HNSW, PQ) with CPU and GPU implementations. Optimised for billion-scale datasets.
Related Terms:
Numerical model representing physical processes in atmosphere, ocean, land surface, and cryosphere. Solves discretised equations on 3D grid with typical horizontal resolution 50-100 km. Also called Earth System Model (ESM) when including biogeochemical cycles.
Related Terms:
Hierarchical Density-Based Spatial Clustering. Extension of DBSCAN that builds a hierarchy of clusters and extracts stable clusters across multiple density thresholds. More robust to varying density and requires fewer parameters than DBSCAN.
Related Terms:
Graph-based approximate nearest neighbour search algorithm. Constructs multi-layer graph where each layer is a proximity graph with progressively fewer edges. Query routing starts at top layer and refines through lower layers, achieving O(log n) complexity.
Related Terms:
Dot product of two vectors: sum of products of corresponding components. Computationally efficient but sensitive to vector magnitudes. For normalised vectors, equivalent to cosine similarity. Used in maximum inner product search (MIPS).
Related Terms:
Two-stage approximate nearest neighbour index. Coarse quantisation via k-means clustering (IVF) narrows search space; product quantisation (PQ) compresses vectors into compact codes. Reduces memory 32-64× but increases query latency and reduces recall vs HNSW.
Related Terms:
Problem of finding k vectors approximately closest to query vector, allowing bounded error in distance computation. Approximate methods trade perfect accuracy (recall < 1.0) for speed, enabling sub-linear query time in high-dimensional spaces.
Related Terms:
Problem of finding k vectors exactly closest to query vector according to specified distance metric. Exact k-NN requires O(nd) distance computations for n vectors in d dimensions, becoming intractable for large n. Approximate methods (k-ANN) enable practical solutions.
Related Terms:
Time required to process a single query, typically measured in milliseconds. Key performance metric for vector databases. Sub-10ms latency enables interactive applications; 50-100ms acceptable for batch processing. Increases with corpus size and dimensionality.
Related Terms:
Abstract high-dimensional space where data is represented as embeddings. Points close in latent space correspond to semantically similar inputs. Learned by encoder networks to capture meaningful structure from raw data.
Related Terms:
Open-source distributed vector database designed for billion-scale similarity search. Supports multiple index types (HNSW, IVF, DiskANN), horizontal scaling across nodes, GPU acceleration, and hybrid search (vector + metadata filtering). Cloud-native architecture.
Related Terms:
Ensemble combining outputs from multiple independent climate models to characterise structural uncertainty. CMIP6 MME includes 50+ models. Ensemble mean often outperforms individual models; spread quantifies inter-model disagreement.
Related Terms:
Transformation of vectors to unit length (||v|| = 1) by dividing each component by vector magnitude. Essential preprocessing for cosine similarity and inner product search. Removes magnitude information, preserving only directional relationships.
Related Terms:
Identification of data points significantly different from majority of dataset. In vector databases, outliers have low density in embedding space (few nearby neighbours). Density-based methods (DBSCAN, HDBSCAN) or distance-based thresholds used for detection.
Related Terms:
PostgreSQL extension adding vector similarity search capabilities to relational database. Supports exact and approximate (HNSW, IVF) search with multiple distance metrics. Enables hybrid queries combining vector similarity with SQL predicates. Version 0.5.0+ includes HNSW support.
Related Terms:
Lossy compression technique for high-dimensional vectors. Splits vector into m subvectors, quantises each independently via k-means, stores centroid indices. Reduces memory from 4d bytes (float32) to m×log₂(k) bits whilst enabling approximate distance computation.
Related Terms:
Open-source vector database written in Rust, optimised for memory efficiency and query performance. Supports HNSW indexing, payload filtering, quantisation, and distributed deployment. Designed for production workloads with strong consistency guarantees.
Related Terms:
Process of mapping continuous values to discrete set, reducing memory and computation. Scalar quantisation reduces precision (float32 → int8); product quantisation clusters subvectors. Introduces approximation error but enables larger-scale deployments.
Related Terms:
Computational time required to retrieve k nearest neighbours for a single query vector. Includes distance computations, index traversal, and result ranking. Scales with corpus size (n), dimensionality (d), and k. Target: <10ms for interactive systems.
Related Terms:
High-resolution climate model covering limited geographic domain (e.g., Europe, North America). Driven at boundaries by GCM output. Typical resolution 10-50 km enables representation of mesoscale processes (orography, land-sea contrasts) absent in GCMs.
Related Terms:
Fraction of true nearest neighbours successfully retrieved by approximate search algorithm. Recall@k measures proportion of true k-NN found in approximate k-NN results. Typical targets: 0.95-0.99. Higher recall requires more computation (larger ef_search in HNSW).
Related Terms:
Training paradigm that learns representations from unlabelled data by solving pretext tasks. Contrastive methods (SimCLR, MoCo) learn by distinguishing positive pairs from negatives. Enables training on large datasets without manual annotation.
Related Terms:
Horizontal partitioning of vector corpus across multiple nodes or indices. Geographic sharding divides by spatial region; temporal sharding by time period; hash sharding by vector ID. Reduces per-shard search space, improving query latency at cost of coordination overhead.
Related Terms:
Simple Framework for Contrastive Learning of Visual Representations. Self-supervised method that learns embeddings by maximising agreement between differently augmented views of same image. Uses NT-Xent loss with large batch sizes and strong augmentations.
Related Terms:
Retrieval of items most similar to query according to distance metric in embedding space. Fundamental operation in vector databases. Applications include recommendation systems, duplicate detection, and analogue retrieval in climate science.
Related Terms:
Scenario framework describing plausible future socioeconomic development and greenhouse gas emissions. SSP1-1.9 (low emissions) to SSP5-8.5 (high emissions). Used in CMIP6 to drive climate model projections for 21st century.
Related Terms:
Empirical method deriving high-resolution climate variables from coarse GCM output using statistical relationships. Includes regression methods, weather typing, and machine learning approaches. Computationally cheaper than dynamical downscaling but assumes stationarity of relationships.
Related Terms:
Neural network architecture based on self-attention mechanism. Processes sequences in parallel (vs sequential RNNs), enabling efficient training on long sequences. Vision Transformer (ViT) applies to images via patch embeddings. Foundation of modern language and vision models.
Related Terms:
Characterisation of confidence in climate projections. Sources include internal variability (initial condition uncertainty), model uncertainty (structural differences), and scenario uncertainty (future emissions). Ensemble spread provides estimate of total uncertainty.
Related Terms:
Specialised database optimised for storing, indexing, and querying high-dimensional vector embeddings. Supports approximate nearest neighbour search with sub-linear complexity. Examples include Milvus, Qdrant, Pinecone, Weaviate, and pgvector.
Related Terms:
Transformer architecture adapted for image processing. Splits image into fixed-size patches, linearly embeds each patch, adds positional encoding, and processes with transformer encoder. Achieves state-of-the-art performance on image classification when trained on large datasets.
Related Terms: