processed #processing #toprocess #important #short #long #casual #focus¶

Tags¶

Metadata: #topic
Part of: Machine learning
Related:
Includes:
Additional:

Significance¶

Intuitive summaries¶

Definitions¶

Mathematical and physics based theory of artificial Intelligence, such as machine learning, such as deep learning with artificial neural networks or [[reinforcement learning]].

Technical summaries¶

Main resources¶

Landscapes¶

[[Statistical learning theory]]
- Statistical learning theory - Wikipedia
- Includes: Statistics, [[Functional analysis]]
Deep Learning
- [[The Principles of Deep Learning Theory]]
  - [2106.10165] The Principles of Deep Learning Theory
  - Introduction to Deep Learning Theory - YouTube
  - A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets - YouTube
  - Includes: [[Linear algebra]], [[Calculus]] ([[Multivariable Calculus]]), Probability theory, Statistics, [[Differential equations]], Information theory, [[Optimization theory]], Physics ([[Theoretical physics]]) (Statistical mechanics, Quantum mechanics) ([[effective field theory]], [[renormalization group]]), [[Functional analysis]], [[Bayesian statistics]], [[Signal processing]], [[Kernel methods]]
  - [1910.00359] Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory
- [[Geometric deep learning]]
  - Geometric Deep Learning - Grids, Groups, Graphs, Geodesics, and Gauges
  - ICLR 2021 Keynote - "Geometric Deep Learning: The Erlangen Programme of ML" - M Bronstein - YouTube
  - GEOMETRIC DEEP LEARNING BLUEPRINT - YouTube
  - Includes: [[Group theory]] ([[Symmetry]]), [[Differential geometry]], [[Topology]], [[Harmonic analysis]], [[Functional analysis]], Probability theory, Category theory, [[Algebra]], [[Graph theory]], [[Geometry]], [[Computational geometry]]
- [[Spline Theory of Deep Learning]]
  - A Spline Theory of Deep Learning
  - Ahmed Imtiaz Humayun on the spline theory of NNs #machinelearning - YouTube
  - Includes:[[Spline theory]], [[Approximation theory]], [[Linear algebra]], [[Optimization theory]], Information theory, [[Signal processing]], [[Functional analysis]]
- [[Categorical Deep Learning]]
  - [2402.15332] Categorical Deep Learning: An Algebraic Theory of Architectures
  - WE MUST ADD STRUCTURE TO DEEP LEARNING BECAUSE... - YouTube
  - Generalizing [[Geometric deep learning]], [[Topological data analysis]]
  - Includes: Category theory, [[Algebra]], [[Abstract algebra]], [[Group theory]], [[Topology]], [[Universal algebra]], [[Type theory]], [[Linear algebra]], [[Automata theory]], Logic
- [[Singular learning theory]]
  - Singular Learning Theory - LessWrong
  - singular learning theory in nLab
  - Singular Learning Theory - Working Session 1 - YouTube
  - Includes: Applying [[Algebraic Geometry]] to [[Statistical learning theory]]
  - My Criticism of Singular Learning Theory — LessWrong
- Shard theory
  - Shard Theory: An Overview — LessWrong
  - Includes: [[Utility theory]], [[Game theory]], [[Optimization theory]], Probability theory
- [[Neural tangent kernel]]
  - Neural tangent kernel - Wikipedia
  - Includes: [[Kernel methods]], [[Linear algebra]], [[Calculus]], Probability theory , Statistics , [[Optimization theory]], [[Functional analysis]]
- [[Mathematics of adversarial deep learning]], deep learning being unstable despite the existence of stable neural networks
  - [2109.06098] The mathematics of adversarial attacks in AI -- Why deep learning is unstable despite the existence of stable neural networks
  - Mathematical Theory of Adversarial Deep Learning
    - Mathematical Theory of Adversarial Deep Learning Presentation
- Why More Is More (in Artificial Intelligence) | by Manuel Brenner | Towards Data Science
  - How Deep Learning Generalizes
  - FLAT MINIMA - sharp minima tend to generalize poorly compared to their flat counterparts and gradient descent is more likely to run into flat minima during optimization
- [[Fundamental limits of neural networks inspired by Gödel and Turing]]
  - Mathematical paradox demonstrates the limits of AI | University of Cambridge
  - The difficulty of computing stable and accurate neural networks: On the barriers of deep learning and Smale’s 18th problem | PNAS
- [[Neural operators]]
  - Neural operators - Wikipedia
    - [2010.08895] Fourier Neural Operator for Parametric Partial Differential Equations
Idealizations
- AIXI
- Godel machine
Reverse engineering
- Mechanistic interpretability

Contents¶

Deep dives¶

Brain storming¶

Categorical geometric effective deep learning

Additional resources¶

Explanation by AI¶

Landscapes by AI¶

Deep dives by AI¶

Neural networks are biased towards learning simple, generalizing solutions due to the following key reasons:
The parameter-function map is heavily biased towards simple functions. There is an exponential bias in the parameter-function map of neural networks that strongly favors simple functions. The number of parameters that produce a given simple function is much larger than the number of parameters producing a complex function[1][3].
SGD and its variants exhibit an extreme simplicity bias. Standard training procedures like SGD have a strong tendency to find simple models, even in the presence of more complex predictive features in the data. Neural networks can rely exclusively on the simplest features and remain invariant to all complex features[2].
The simplicity bias is present at initialization, before any training. Even at random initialization, neural networks are more likely to express simple functions compared to complex ones. Changing this initialization probability is difficult and the simplicity bias is not too sensitive to the type of initialization used[3].
Compatible inductive biases lead to better generalization from limited data. When the simplicity bias of the neural network aligns well with the learning task, it facilitates good generalization performance even from a small number of training examples[5].

In summary, the combination of the exponentially biased parameter-function map, the extreme simplicity bias of SGD, and the presence of this bias at initialization, all contribute to neural networks learning simple patterns that generalize well to unseen data. This simplicity bias is a key factor behind their success[1][2][3].

Citations: [1] [1805.08522] Deep learning generalizes because the parameter-function map is biased towards simple functions [2] [2006.07710] The Pitfalls of Simplicity Bias in Neural Networks [3] Deep Neural Networks are biased, at initialisation, towards simple functions | by Chris Mingard | Towards Data Science [5] Generalization and Inductive Bias in Neural Networks - YouTube

Neural networks are biased towards learning simple, generalizing solutions despite being highly expressive models capable of fitting even random input-output mappings. This bias towards simplicity can be understood in several ways:

Low frequency bias: Deep ReLU networks are biased towards learning low frequency functions, meaning they tend to capture global patterns and avoid local fluctuations[4]. Intuitively, this aligns with the observation that over-parameterized networks find simple patterns that generalize across data samples.
Simplicity bias in the parameter-function map: The mapping from the space of network parameters (weights) to the space of functions (input-output mappings) is heavily biased towards functions with low descriptional complexity[5]. Simple functions are exponentially more likely to occur upon random sampling of network parameters compared to complex functions.
Constraints and geometry of decision boundaries: By constraining each output of the neural network to be a convex combination of its inputs, certain desirable geometries of the decision boundaries can be achieved[1]. These constraints may guide the network towards learning simpler, generalizing solutions that lie in a constrained subspace of the hypothesis space.
Optimization towards simple solutions: Even with random initialization, iterative optimization of neural network parameters tends to converge towards simple, explicit solutions[2]. This suggests an inherent bias in the optimization process towards finding simple, generalizing solutions.

In summary, the combination of architectural biases, simplicity bias in the parameter-function map, constraints on decision boundaries, and optimization dynamics all contribute to neural networks' tendency to learn simple, generalizing solutions despite their high expressivity.

Citations: [1] [PDF] Towards Understanding and Improving the Generalization Performance of Neural Networks | Semantic Scholar [2] [2311.07498] Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks [3] [2212.09993] Are Deep Neural Networks SMARTer than Second Graders? [4] [1806.08734] On the Spectral Bias of Neural Networks [5] Simplicity bias in the parameter-function map of deep neural networks | Semantic Scholar

AI¶

Additional metadata¶