Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strange learning curves - training own model #309

Open
tinalegre opened this issue Nov 4, 2015 · 3 comments
Open

strange learning curves - training own model #309

tinalegre opened this issue Nov 4, 2015 · 3 comments

Comments

@tinalegre
Copy link

Dear all,

I'm training my first CNN on my own data :-) Basically, I've ~80K training patches assigned to 9 different classes.
The architecture of my CNN is based on the 'oxford-vgg-m' model with the difference that I introduced dropout regularisation for the FC layers with ratio set to 0.5. In addition, the before last-one layer was replaced with random weights and the number of target classes was updated correspondingly from 1000 (as in Imagenet) to 9 (from my DB). The last layer is set to softmaxloss. The labels were numbered from 1 to 9. I also needed to re-initialize the weights of the filters of the first layer (my input images have 12 channels). The rest of the weights on the network are the same as in 'oxford-vgg-m'. During training, I used initially batches of 256 images (no data augmentation). Normalization is also considered by subtracting the input images with the Imagenet average training image (net.normalization.averageImage).
After the first few training iterations, I noticed that my model was suffering from overfitting (see the Figure below with the red rows). In addition, I also found strange the energy values from the objective function.
Curves first training

To further reduce overfitting, I had the idea to first try a simple data augmentation by rotating the images to 90, 180, and 270 degrees. With that, I reduced the batch size from 256 to 64 images.
Now, after the first 3 training iterations, I got the following curves:
Curves second training

To me, it seems to be that I'm doing something wrong in the training or haven't configured the network properly? First, in the second Figure, both the training and validation top-1 errors are above 1, and they should be in the range: [0,1]! I'm not really understanding what did I do wrong, as it's the same training dataset as on my first try (first Figure). The only difference is the data-augmentation (through the rotations) and the batch size.
I'm using MatConvNet version 16.0 with CUDA 6.5+Matlab 2014a.

Any ideas about what could have gone wrong are really welcomed :-)

@iiwindii
Copy link

iiwindii commented Nov 7, 2015

according to the curves, I think it is more likely to be wrong training rather than overfitting. i have some suggestions, and you can try, but i do not know if they can solve your problem:
(1) remove the dropout layer (2) check your learning rate. if you change an original layer, its learning rate should be a bit large, and vice versa (3) reduce the batchsize (4) subtract the mean computed using your own data rather than using the mean provided by the pretrained model

I also have some doubts: (1) your input images have 12 channels?? I cannot understand, if it is rgb image, there should be 3 channels. what are your image from?

if you solve this problem, plz tell me!

@tinalegre
Copy link
Author

@fengYunXiaoZi thanks for your suggestions! Regarding your points: (2) what learning rate do you suggest? and how did you obtain that value, (3) my current batchsize is 64, do you think, I still need to reduce it? what is the influence of reducing it? from what I understand one shall prefer larger batches, so as to reduce the chances of over-fitting, not? (4) good idea, however I tried that once, but it didn't help much. Was thinking does it make sense to re-use weights of only some layers of a pre-trained model and the rest to initialize randomly (e.g first layer of vgg-m random initiliazed)? I'm using BTW multi-spectral images. After substracting the average training image, the image values shall be between [-1,1], right?

@iiwindii
Copy link

iiwindii commented Nov 7, 2015

(2) I mean the overall learning rate and the learning rate of each conv layer. you can refer to fine-tune models using caffe https://github.com/BVLC/caffe/tree/master/examples/finetune_flickr_style. In Explanation section, you can find information about learning rate. (3) ignore this..... i do not think batchsize has something to do with overfitting. here i just mean batchsize should not be too big (4) that do make sense. But you should know the first several layers are hard to train, so it is wise to keep the original weights and use them as the initialized weights when fine-tuning. multi-spectral images? are they remote sensing images? Sorry, i am not sure if MatConvNet can deal with images with so many channels. In practice, the images are rgb images with three channels or grayscale images with single channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants