Hi, if validation_split (of model.fit) is assigned a percentage, does keras shuffle the entire training data before splitting it? Sorry if this is a repeated question since i saw some discussion from years ago saying that it does not shuffle and the keras' doc pages does not explicitly mention shuffling in validation_split section. But chatgpt says that in more recent versions of keras (later than 2.2.3) validation_split shuffles the entire data before splitting. Thanks for the help.

Comment From: sonali-kumari1

Hi @cuneyt76 -

If you are passing Numpy arrays and shuffle argument in model.fit() is set to True(which is the default), then the training data will be shuffled before splitting it and validation_split will take the last x% of the shuffled data. Please refer to this documentation for more details. Thanks!

Comment From: cuneyt76

Hi @cuneyt76 -

If you are passing Numpy arrays and shuffle argument in model.fit() is set to True(which is the default), then the training data will be shuffled before splitting it and validation_split will take the last x% of the shuffled data. Please refer to this documentation for more details. Thanks!

Hello @sonali-kumari1 , Thank you for your time and reply. But I'm still confused; the doc you linked says this: "Note that the data isn't shuffled before extracting the validation split, so the validation is literally just the last x% of samples in the input you passed." But you stated otherwise in your reply. Will you please help me clarify?

Comment From: sonali-kumari1

Hi @cuneyt76 -

Thanks for pointing that out. "Note that the data isn't shuffled before extracting the validation split, so the validation is literally just the last x% of samples in the input you passed." means the validation data is selected from the last x% of samples in the original data, before shuffling. Even if shuffle argument in model.fit()is set to True, it will only shuffle the training data at each epoch, not the validation data. Please refer to the description of validation_split in fit() method here.

Comment From: cuneyt76

Hello @sonali-kumari1 , Yes, it seems that data isn't shuffled before extracting the validation split. I just wanted to make sure since your first reply stated that the training data will be shuffled before splitting. I'm aware that validation data is never shuffled. Thank you for your support & the links you provided.

Comment From: sonali-kumari1

@cuneyt76 -

I am glad the links were helpful. Please feel free to close this issue if everything is resolved. Thanks!

Comment From: google-ml-butler[bot]

Are you satisfied with the resolution of your issue? Yes No