Neural Style Transfer (NST) was introduced by Leon Gatys et al. in 2015. It consists of applying the style of a reference image to a target image while conserving the content, as exemplified:
Style: Textures, colors, visual patterns across various spatial scales.
Content: Higher-level macrostructure of the image.
The idea of style transfer is related to texture generation, which has a long history in image processing before the development of its neural counterpart.
The key notion behind the implementation of NST is the same idea fundamental to all Deep Learning algorithms: definition of a loss function.
In high-level terms, the loss function is defined as:
Here:
-
$\text{distance}$ : A norm function such as$\text{L}_2$ norm. -
$\text{content}$ : A function computing a representation of the image content. -
$\text{style}$ : A function computing the representation of the image style.
Minimizing the loss function ensures:
-
$\text{style}(\text{combination}_{image}) \approx \text{style}(\text{reference}_{image})$ , -
$\text{content}(\text{combination}_{image}) \approx \text{content}(\text{original}_{image})$ .
Gatys et al. found that convolutional neural networks (CNNs) offer a way to mathematically define the
The activations from earlier layers in a network contain local information, while activations from higher layers capture increasingly global and abstract information. Thus, the content of an image, which is more global, is found in the upper-layer representations of a CNN.
Let
This guarantees that the generated image will maintain high-level structural similarity to the target content image. It assumes that upper layers in a CNN effectively "see" the content of the input images.
Unlike content loss, which uses a single upper layer, style loss uses multiple layers of a CNN to capture texture patterns across spatial scales.
To model the style of an image, we compute the Gram matrix of a layer's activations. The Gram matrix captures the correlations between feature maps at layer
For a feature map
where
where:
-
$N_l = C_l$ : Number of feature maps (channels). -
$M_l = H_l \times W_l$ : Number of spatial locations.
The total style loss aggregates contributions across multiple layers:
Here,
The total loss combines content and style losses:
where:
-
$\alpha$ : Weight for content preservation. -
$\beta$ : Weight for style transfer.
-
Choice of Layers:
- Content loss typically uses higher layers to capture global structure.
- Style loss uses multiple layers (low and high) to capture textures at various scales.
-
Customization of Style Scales:
By adjusting$w_l$ , specific spatial scales of style can be emphasized or suppressed. -
Optimization Process:
- The generated image is initialized (e.g., as noise or the content image).
- Gradient descent is applied to iteratively minimize
$\mathcal{L}_{\text{total}}$ .
-
Summary:
- Preserve content by aligning high-level activations of the generated and content images.
- Preserve style by aligning feature correlations (Gram matrices) of the generated and style images.
[1] Gatys, L. A. (2015). A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576.
[2] Chollet, Francois. Deep learning with Python. Simon and Schuster, 2021.
This project provides a simple interface to experiment with Neural Style Transfer, including style transfer, content visualization, and style visualization.
Use this to apply the style of an image to a content image.
main(content_path='img/content/dancing.jpg', style_path='img/style/picasso.jpg', mode='style_transfer')
Use this to observe how the content image emerges from random noise during optimization.
main(content_path='noise', style_path='img/content/dancing.jpg', mode='content')
Use this to visualize how the network interprets the style of the image.
main(content_path='noise', style_path='img/style/picasso.jpg', mode='style')
- content_path: Path to the content image or 'noise' for random noise initialization.
- style_path: Path to the style image.
- mode: One of the following:
- 'style_transfer': Performs full style transfer.
- 'content': Reconstructs the content image from noise.
- 'style': Visualizes the style as seen by the network.
img/
├── content/
│ └── dancing.jpg
│
├── results/
│
├── style/
│ └── picasso.jpg
Happy experimenting!