Skip to content

Tags

Significance

Intuitive summaries

Definitions

Technical summaries

Main resources

Landscapes

Contents

Deep dives

Brain storming

  • Categorical geometric effective deep learning

Additional resources

Explanation by AI

Landscapes by AI

Deep dives by AI

  • Neural networks are biased towards learning simple, generalizing solutions due to the following key reasons:

  • The parameter-function map is heavily biased towards simple functions. There is an exponential bias in the parameter-function map of neural networks that strongly favors simple functions. The number of parameters that produce a given simple function is much larger than the number of parameters producing a complex function[1][3].

  • SGD and its variants exhibit an extreme simplicity bias. Standard training procedures like SGD have a strong tendency to find simple models, even in the presence of more complex predictive features in the data. Neural networks can rely exclusively on the simplest features and remain invariant to all complex features[2].

  • The simplicity bias is present at initialization, before any training. Even at random initialization, neural networks are more likely to express simple functions compared to complex ones. Changing this initialization probability is difficult and the simplicity bias is not too sensitive to the type of initialization used[3].

  • Compatible inductive biases lead to better generalization from limited data. When the simplicity bias of the neural network aligns well with the learning task, it facilitates good generalization performance even from a small number of training examples[5].

In summary, the combination of the exponentially biased parameter-function map, the extreme simplicity bias of SGD, and the presence of this bias at initialization, all contribute to neural networks learning simple patterns that generalize well to unseen data. This simplicity bias is a key factor behind their success[1][2][3].

Citations: [1] [1805.08522] Deep learning generalizes because the parameter-function map is biased towards simple functions [2] [2006.07710] The Pitfalls of Simplicity Bias in Neural Networks [3] Deep Neural Networks are biased, at initialisation, towards simple functions | by Chris Mingard | Towards Data Science [5] Generalization and Inductive Bias in Neural Networks - YouTube

Neural networks are biased towards learning simple, generalizing solutions despite being highly expressive models capable of fitting even random input-output mappings. This bias towards simplicity can be understood in several ways:

  1. Low frequency bias: Deep ReLU networks are biased towards learning low frequency functions, meaning they tend to capture global patterns and avoid local fluctuations[4]. Intuitively, this aligns with the observation that over-parameterized networks find simple patterns that generalize across data samples.

  2. Simplicity bias in the parameter-function map: The mapping from the space of network parameters (weights) to the space of functions (input-output mappings) is heavily biased towards functions with low descriptional complexity[5]. Simple functions are exponentially more likely to occur upon random sampling of network parameters compared to complex functions.

  3. Constraints and geometry of decision boundaries: By constraining each output of the neural network to be a convex combination of its inputs, certain desirable geometries of the decision boundaries can be achieved[1]. These constraints may guide the network towards learning simpler, generalizing solutions that lie in a constrained subspace of the hypothesis space.

  4. Optimization towards simple solutions: Even with random initialization, iterative optimization of neural network parameters tends to converge towards simple, explicit solutions[2]. This suggests an inherent bias in the optimization process towards finding simple, generalizing solutions.

In summary, the combination of architectural biases, simplicity bias in the parameter-function map, constraints on decision boundaries, and optimization dynamics all contribute to neural networks' tendency to learn simple, generalizing solutions despite their high expressivity.

Citations: [1] [PDF] Towards Understanding and Improving the Generalization Performance of Neural Networks | Semantic Scholar [2] [2311.07498] Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks [3] [2212.09993] Are Deep Neural Networks SMARTer than Second Graders? [4] [1806.08734] On the Spectral Bias of Neural Networks [5] Simplicity bias in the parameter-function map of deep neural networks | Semantic Scholar

AI

Additional metadata

  • processed #processing #toprocess #important #short #long #casual #focus

  • Unfinished: #metadata #tags