-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strange learning curves - training own model #309
Comments
according to the curves, I think it is more likely to be wrong training rather than overfitting. i have some suggestions, and you can try, but i do not know if they can solve your problem: I also have some doubts: (1) your input images have 12 channels?? I cannot understand, if it is rgb image, there should be 3 channels. what are your image from? if you solve this problem, plz tell me! |
@fengYunXiaoZi thanks for your suggestions! Regarding your points: (2) what learning rate do you suggest? and how did you obtain that value, (3) my current batchsize is 64, do you think, I still need to reduce it? what is the influence of reducing it? from what I understand one shall prefer larger batches, so as to reduce the chances of over-fitting, not? (4) good idea, however I tried that once, but it didn't help much. Was thinking does it make sense to re-use weights of only some layers of a pre-trained model and the rest to initialize randomly (e.g first layer of vgg-m random initiliazed)? I'm using BTW multi-spectral images. After substracting the average training image, the image values shall be between [-1,1], right? |
(2) I mean the overall learning rate and the learning rate of each conv layer. you can refer to fine-tune models using caffe https://github.com/BVLC/caffe/tree/master/examples/finetune_flickr_style. In Explanation section, you can find information about learning rate. (3) ignore this..... i do not think batchsize has something to do with overfitting. here i just mean batchsize should not be too big (4) that do make sense. But you should know the first several layers are hard to train, so it is wise to keep the original weights and use them as the initialized weights when fine-tuning. multi-spectral images? are they remote sensing images? Sorry, i am not sure if MatConvNet can deal with images with so many channels. In practice, the images are rgb images with three channels or grayscale images with single channel. |
Dear all,
I'm training my first CNN on my own data :-) Basically, I've ~80K training patches assigned to 9 different classes.

The architecture of my CNN is based on the 'oxford-vgg-m' model with the difference that I introduced dropout regularisation for the FC layers with ratio set to 0.5. In addition, the before last-one layer was replaced with random weights and the number of target classes was updated correspondingly from 1000 (as in Imagenet) to 9 (from my DB). The last layer is set to softmaxloss. The labels were numbered from 1 to 9. I also needed to re-initialize the weights of the filters of the first layer (my input images have 12 channels). The rest of the weights on the network are the same as in 'oxford-vgg-m'. During training, I used initially batches of 256 images (no data augmentation). Normalization is also considered by subtracting the input images with the Imagenet average training image (net.normalization.averageImage).
After the first few training iterations, I noticed that my model was suffering from overfitting (see the Figure below with the red rows). In addition, I also found strange the energy values from the objective function.
To further reduce overfitting, I had the idea to first try a simple data augmentation by rotating the images to 90, 180, and 270 degrees. With that, I reduced the batch size from 256 to 64 images.

Now, after the first 3 training iterations, I got the following curves:
To me, it seems to be that I'm doing something wrong in the training or haven't configured the network properly? First, in the second Figure, both the training and validation top-1 errors are above 1, and they should be in the range: [0,1]! I'm not really understanding what did I do wrong, as it's the same training dataset as on my first try (first Figure). The only difference is the data-augmentation (through the rotations) and the batch size.
I'm using MatConvNet version 16.0 with CUDA 6.5+Matlab 2014a.
Any ideas about what could have gone wrong are really welcomed :-)
The text was updated successfully, but these errors were encountered: