Hi, if validation_split (of model.fit) is assigned a percentage, does keras shuffle the entire training data before splitting it? Sorry if this is a repeated question since i saw some discussion from years ago saying that it does not shuffle and the keras' doc pages does not explicitly mention shuffling in validation_split section. But chatgpt says that in more recent versions of keras (later than 2.2.3) validation_split shuffles the entire data before splitting. Thanks for the help.
Comment From: sonali-kumari1
Hi @cuneyt76 -
If you are passing Numpy arrays and shuffle
argument in model.fit()
is set to True
(which is the default), then the training data will be shuffled before splitting it and validation_split
will take the last x% of the shuffled data. Please refer to this documentation for more details. Thanks!
Comment From: cuneyt76
Hi @cuneyt76 -
If you are passing Numpy arrays and
shuffle
argument inmodel.fit()
is set toTrue
(which is the default), then the training data will be shuffled before splitting it andvalidation_split
will take the last x% of the shuffled data. Please refer to this documentation for more details. Thanks!
Hello @sonali-kumari1 , Thank you for your time and reply. But I'm still confused; the doc you linked says this: "Note that the data isn't shuffled before extracting the validation split, so the validation is literally just the last x% of samples in the input you passed." But you stated otherwise in your reply. Will you please help me clarify?
Comment From: sonali-kumari1
Hi @cuneyt76 -
Thanks for pointing that out. "Note that the data isn't shuffled before extracting the validation split, so the validation is literally just the last x% of samples in the input you passed." means the validation data is selected from the last x% of samples in the original data, before shuffling. Even if shuffle
argument in model.fit()
is set to True
, it will only shuffle the training data at each epoch, not the validation data. Please refer to the description of validation_split
in fit()
method here.
Comment From: cuneyt76
Hello @sonali-kumari1 , Yes, it seems that data isn't shuffled before extracting the validation split. I just wanted to make sure since your first reply stated that the training data will be shuffled before splitting. I'm aware that validation data is never shuffled. Thank you for your support & the links you provided.
Comment From: sonali-kumari1
@cuneyt76 -
I am glad the links were helpful. Please feel free to close this issue if everything is resolved. Thanks!
Comment From: google-ml-butler[bot]