The activation function known as ReLU builds complex nonlinear functions across layers of a neural network, making functions that outline flat faces and sharp edges. But how much of the world breaks down into perfect polyhedra? New work explores an alternative activation function that yields curvaceous results.
What’s new: Stanford researchers led by Vincent Sitzmann and Julien Martel developed the periodic activation function sin(x) to solve equations with well defined higher-order derivatives. They showed preliminary success in a range of applications.
Key insight: Training a neural network updates its weights to approximate a particular function. Backprop uses the first derivative to train networks more efficiently than methods such as hill-climbing that explore only nearby values. Higher-order derivatives contain useful information that ReLU can’t express and other activation functions describe poorly. For example, in the range 0 to 1, the values of x and x2 are similar, but their derivatives are dramatically different. Sine has better-behaved derivatives.
How it works: Sine networks, which the researchers call sirens, are simply neural networks that use sine activation functions. However, they need good initial values.
- A sine network can use layers, regularization, and backprop just like a ReLU network.
- The derivative of a ReLU is a step function, and the second derivative is zero. The derivative of sin(x) is cos(x), which is a shifted sine. Since the derivative of a sine network is another sine network, sine networks can learn as much about the derivative as the original data.
- Since successive layers combine sine functions, their oscillations may become very frequent. Hectic oscillations make training difficult. The researchers avoided this pitfall by generating initialization values that maintain a low frequency.
Results: The authors used sine networks to solve differential equations (where they can learn directly from derivatives), interpret point clouds, and process images and audio. They provide examples and a collab notebook so you can try it yourself. They demonstrated success in all these domains and provided quantitative evidence for the value of gradients when applied to Poisson image reconstruction. The authors trained models to predict the gradient of an image and compared the quality of generated images after reconstruction using Poisson’s equation. Evaluated on the starfish image above, a sine network achieved 32.91 peak signal-to-noise ratio, a measurement of reconstruction quality, compared with 25.79 for Tanh.
Why it matters: ReLUs have been a deep learning staple since 2012. For data that have critical higher-order derivatives, alternatives may improve performance without increasing model complexity.
We’re thinking: ReLUs may be good for drawing the angular Tesla Cybertruck, but sines may be better suited for a 1950 Chevy 3500.