The keras.utils.split_dataset
function requires tensorflow to be installed even tho the docstring states that the dataset
argument can be a torch.utils.data.Dataset
(and I assumed that means it would just work on the torch dataset with no issues).
Comment From: TaiXeflar
Same pytorch user just here.
Keras 3 will use tf as backend as default if KERAS_BACKEND
is not set.
You can export ENV variable or just add this at the top of your python script:
import os
os.environ["KERAS_BACKEND"] = "torch"
I think this will helps?
Comment From: oluwandabira
Thanks but that isn't the issue. I've set the KERAS_BACKEND
environment variable to torch
I can fit and evaluate and evaluate my pytorch model with pytorch dataloaders just fine, but spliting my pytorch dataset with keras.utils.split_dataset
fails because I don't have tensorflow installed.
Comment From: sonali-kumari1
Hi @oluwandabira -
Could you please provide a standalone code and Pytorch dataset you are trying to split with keras.utils.split_dataset
to replicate this issue. Thanks!
Comment From: oluwandabira
import os
os.environ["KERAS_BACKEND"] = "torch"
import torch
import keras
class Dataset(torch.utils.data.Dataset):
def __init__(self, len : int):
super().__init__()
self._len = len
def __len__(self):
return self._len
def __getitem__(self, index):
return torch.rand((224, 224, 3)), torch.rand(())
dataset = Dataset(10)
train, test = keras.utils.split_dataset(dataset, left_size=0.7)
pip freeze
absl-py==2.1.0
contourpy==1.3.1
cycler==0.12.1
filelock==3.17.0
fire==0.7.0
fonttools==4.56.0
fsspec==2025.2.0
h5py==3.13.0
Jinja2==3.1.5
joblib==1.4.2
keras==3.8.0
kiwisolver==1.4.8
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.10.0
mdurl==0.1.2
ml_dtypes==0.5.1
mpmath==1.3.0
namex==0.0.8
networkx==3.4.2
numpy==2.2.3
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
optree==0.14.0
packaging==24.2
pillow==11.1.0
Pygments==2.19.1
pyparsing==3.2.1
python-dateutil==2.9.0.post0
rich==13.9.4
scikit-learn==1.6.1
scipy==1.15.2
six==1.17.0
sympy==1.13.1
termcolor==2.5.0
threadpoolctl==3.5.0
torch==2.6.0
torchvision==0.21.0
triton==3.2.0
typing_extensions==4.12.2
Running the above code in the above environment results in a ImportError: This requires the tensorflow module. You can install it via 'pip install tensorflow'
Comment From: sonali-kumari1
Hi @oluwandabira -
Thanks for sharing the reproducible code. We will look into it and update you soon.
Comment From: sonali-kumari1
Hi @oluwandabira -
The error ImportError: This requires the tensorflow module. You can install it via 'pip install tensorflow'
occurs because split_dataset function is trying to apply tensorflow-specific operations like tf.data.Dataset.from_tensor_slices
and after splitting, it returns a tuple of two tf.data.Dataset objects even if you are using torch.utils.data.Dataset
. To resolve this error, you can either install tensorflow or use Pytorch's random_split function. Thanks!
Comment From: innat
@sonali-kumari1 IMO, this should be backend agnostic. The split method should not use tf ops with torch backend, instead it should have backend specific splitting method.
https://github.com/keras-team/keras-hub/issues/2128
Comment From: sibyjackgrove
I feel this issue is connected to #21009 It seems that even when we set the backend to something other than TensorFlow, there are still dependencies on TensorFlow for some Keras features.