Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) ? #250

Open
mrgloom opened this issue Feb 14, 2021 · 7 comments
Open

Why d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) ? #250

mrgloom opened this issue Feb 14, 2021 · 7 comments

Comments

@mrgloom
Copy link

mrgloom commented Feb 14, 2021

I wonder why it's d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) and not d_loss = np.add(d_loss_real, d_loss_fake)?

https://github.com/eriklindernoren/Keras-GAN/blob/master/gan/gan.py#L123

@aniketmaurya
Copy link

Loss from real and fake images are averaged. CMIIW, I guess this is how loss is calculated in the paper but if we just sum the loss then also we might get the same result.

@mrgloom
Copy link
Author

mrgloom commented Feb 14, 2021

Discriminator use BCE loss:
BCE = - y * log(y_pred) - (1 - y) * log(1 - y_pred)
as I understand we can rewrite this code https://github.com/eriklindernoren/Keras-GAN/blob/master/gan/gan.py#L121-L122 as d_loss = self.discriminator.train_on_batch([gen_imgs, imgs], [fake, valid]), by [gen_imgs, imgs] here I mean concatenate.

I guess this is how loss is calculated in the paper

Can you point out where it's specified in the paper?

@mrgloom
Copy link
Author

mrgloom commented Feb 14, 2021

I found this comment in pix2pix paper In addition, we divide the objective by 2 while optimizing D, which slows down the rate at which D learns relative to G. and in cyclegan paper In practice, we divide the objective by 2 while optimizing D, which slows down the rate at which D learns, relative to the rate of G.

Actually I have tested it without 0.5 on simple dataset like parabola from here and it still works.

@mrgloom
Copy link
Author

mrgloom commented Feb 14, 2021

Actually I was not able to break it even with d_loss = 100000.0 * np.add(d_loss_real, d_loss_fake), as I understand in this keras code it's not affect the training procedure, but just just averages metrics for display.

@aniketmaurya
Copy link

Multiplying loss with a constant will have same effect as that of a learning rate. While training a GAN we two try that both generator and discriminator learns at the same pace.

@mrgloom
Copy link
Author

mrgloom commented Feb 14, 2021

I mean as I understand in Keras if you want to apply weight to loss you should use loss_weights in compile https://github.com/eriklindernoren/Keras-GAN/blob/master/gan/gan.py#L29-L31

            loss_weights: Optional list or dictionary specifying scalar
                coefficients (Python floats) to weight the loss contributions
                of different model outputs.
                The loss value that will be minimized by the model
                will then be the *weighted sum* of all individual losses,
                weighted by the `loss_weights` coefficients.
                If a list, it is expected to have a 1:1 mapping
                to the model's outputs. If a tensor, it is expected to map
                output names (strings) to scalar coefficients.

but if you already get metrics, you just multiply them by constant here https://github.com/eriklindernoren/Keras-GAN/blob/master/gan/gan.py#L123 and it's not affecting the training process, I mean this multiplication is not 'part of the graph' like for example in tensorflow.

@Nevermetyou65
Copy link

I come here for the same question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants