Some basic loss defined in pytorch. Sometimes loss definition in torch.nn
(nn
) and torch.nn.functional
(F
) can confuse us.
Notes:
- regression: predict a real-value quantity
- classification: predict a probability (need a non-linear activation function)
Negative Log Likelihood Loss
[classification] It is useful to train a classification problem with C classes.
nn.NLLLoss(input, target)
== F.nll_loss()
- input: (N, C), or $(N, C, d_{1}, d_{2}, …, d_K)$ with $K \geq 1$ in the case of K-dimensional loss.
- target: (N), or $(N, d_{1}, d_{2}, …, d_{K})$ with $K \geq 1$ in the case of K-dimensional loss.
- output: scalar. If reduction is none, then the same size as the target: (N), or $(N, d_{1}, d_{2}, …, d_{K})$ with $K \geq 1$ in the case of K-dimensional loss.
1
2
3
4
5
6
7
8
N, C = 3, 5
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
input = torch.randn(N, C, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.empty(N, dtype=torch.long).random_(0, C)
output = loss(m(input), target) # scalar
1
2
3
4
5
6
7
8
9
10
# 2D loss example (used, for example, with image inputs)
N, C = 5, 4
loss = nn.NLLLoss()
data = torch.randn(N, 16, 10, 10) # (N, C, H, W)
conv = nn.Conv2d(16, C, (3, 3))
m = nn.LogSoftmax(dim=1) # dim -->
# each element in target has to have 0 <= value < C
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C) # (N, 8, 8)
output = m(conv(data)) # (N, C, 8, 8)
l = loss(output, target) # scalar
1
2
3
4
5
6
7
8
9
10
11
12
# loss example (used, for example, with pcl inputs)
N, L, C = 4, 10, 6
loss = nn.NLLLoss()
m = F.log_softmax(dim=-1)#, feature_transform
pcl = torch.randn(N, L, 3) # (N, L, 3), 3: x, y, z
fc1 = nn.Linear(3, 64)
fc2 = nn.Linear(64, C)
output = m(fc2(fc1(pcl))) # (N, L, C), m dim: -1 or 2
target = torch.empty(N, L, dtype=torch.long).random_(0, C) # (N, L)
target = target.view(-1) # (N*L)
output = output.view(-1, C) # (N*L, C)
l = loss(output, target) # scalar
Cross Entropy Loss
\[L_{CE} = - \sum_{i=1}^{n} t_{i}log(p_{i}),\][classification] Cross Entropy Loss, also called logarithmic loss, log loss or logistic loss
where, $t_{i}$ is the truth label and $p_{i}$ is the softmax probability for $i^{th}$ class.
nn.CrossEntropyLoss(input, target)
== F.cross_entropy()
== F.nll_loss(F.log_softmax(input, 1), target)
. In Pytorch, these criterion combines log_softmax
and nll_loss
in a single function.
- input: (N, C), (N, C, H, W)
- target: (N) where each value is $0 \leq \text{targets}[i] \leq C-1$, or $(N, d_1, d_2, …, d_K)$ where $K \geq 1$ for K-dimensional loss.
- output: scalar. If reduction is none, then the same size as the target (N), or $(N, d_1, d_2, …, d_K)$ with $K \geq 1$ in the case of K-dimensional loss.
1
2
3
4
5
6
7
8
9
10
11
12
N, C = 16, 5
# Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(N, C, requires_grad=True)
target = torch.empty(N, dtype=torch.long).random_(C)
output = loss(input, target) # scalar
# Example of target with class probabilities
input = torch.randn(N, C, requires_grad=True)
target = torch.randn(N, C).softmax(dim=1)
output = loss(input, target)
Binary Cross Entropy
[classification] Cross Entropy Loss, also referred as Logarithmic loss
The problem is framed as predicting the likelihood of an example belonging to class one, e.g. the class that you assign the integer value 1, whereas the other class is assigned the value 0.
nn.BCELoss(input, target)
== F.binary_cross_entropy()
- input: Tensor of arbitrary shape
- target: Tensor of the same shape as input
1
2
3
input = torch.randn((3, 2), requires_grad=True)
target = torch.rand((3, 2), requires_grad=False)
loss = F.binary_cross_entropy(F.sigmoid(input), target)
Function that measures Binary Cross Entropy between target and output logits. This loss combines a Sigmoid
layer and the BCELoss
in one single
nn.BCEWithLogitsLoss(input, target)
== F.binary_cross_entropy_with_logits()
1
2
3
input = torch.randn((3, 2), requires_grad=True)
target = torch.rand((3, 2), requires_grad=False)
loss = nn.BCEWithLogitsLoss(input, target)
Smooth L1 Loss
[regression]
Function that uses a squared term if the absolute element-wise error falls below $\beta$ (1 by default) and an L1 term otherwise. It is less sensitive to outliers than the MSELoss and in some cases prevents exploding gradients (e.g. see Fast R-CNN paper by Ross Girshick).
\[L_{Smooth\ L1} = \frac{1}{n} \sum_{i} z_{i}\]where $z_{i}$ is given by:
\[z_{i} = \begin{cases} 0.5 (x_i - y_i)^2 / \beta, & \text{if } |x_i - y_i| < \beta \\ |x_i - y_i| - 0.5 * \beta, & \text{otherwise } \end{cases}\]$x$ and $y$ arbitrary shapes with a total of $n$ elements each the sum operation still operates over all the elements, and divides by $n$.
nn.SmoothL1Loss(input, target)
== F.smooth_l1_loss()
- input: (*)
- target: (*), same shape as the input.
- output: scalar. If reduction is ‘none’, then (*), same shape as the input.
1
2
3
4
5
6
N, C = 16, 5
loss = nn.SmoothL1Loss() # default reduction='mean'
input = torch.randn(N, C, requires_grad=True)
target = torch.randn(N, C)
output = loss(input, target) # scalar
MSE Loss
\[L_{MSE} = \frac{1}{n}\sum_{i=1}^{n}\left ( Y_{i} - \hat{Y}_{i} \right )^{2}\][regression] Mean Squared Error (MSE) Loss, also called L2 norm
nn.MSELoss(input, target)
== F.mse_loss()
- input: (N, *)
- target: (N, *)
- ouput: scalar
1
2
3
4
5
6
7
8
9
10
11
N, C = 16, 5
loss = nn.MSELoss() # default reduction='mean'
input = torch.randn(N, C, requires_grad=True)
target = torch.randn(N, C)
output = loss(input, target) # scalar
# torch.square(input-target).mean()
# loss1 = nn.MSELoss(reduction='sum')
# output1 = loss1(input, target) # scalar
# torch.square(input-target).sum()
L2 Loss
L2 loss is another name for Mean Squared Error (MSE) Loss.
MAE Loss
\[L_{MAE} = \frac{1}{n}\sum_{i=1}^{n}\left | Y_{i} - \hat{Y}_{i} \right |\][regression] Mean Absolute Error (MAE) Loss, also called L1 norm
nn.L1Loss(input, target)
== F.l1_loss()
== nn.L1Loss()
- input: (N, *)
- target: (N, *)
- ouput: scalar
1
2
3
4
5
6
7
8
9
10
11
N, C = 16, 5
loss = nn.L1Loss()
input = torch.randn(N, C, requires_grad=True)
target = torch.randn(N, C)
output = loss(input, target)
# torch.abs(input-target).mean()
# loss1 = nn.L1Loss(reduction='sum')
# output1 = loss1(input, target) # scalar
# torch.abs(input-target).sum()
L1 Loss
L1 loss is another name for Mean Absolute Error (MAE) Loss.