Keras torch backend: keras.utils.split_dataset requires tensorflow

The keras.utils.split_dataset function requires tensorflow to be installed even tho the docstring states that the dataset argument can be a torch.utils.data.Dataset (and I assumed that means it would just work on the torch dataset with no issues).

Comment From: TaiXeflar

Same pytorch user just here. Keras 3 will use tf as backend as default if KERAS_BACKEND is not set. You can export ENV variable or just add this at the top of your python script:

import os
os.environ["KERAS_BACKEND"] = "torch"

I think this will helps?

Comment From: oluwandabira

Thanks but that isn't the issue. I've set the KERAS_BACKEND environment variable to torch I can fit and evaluate and evaluate my pytorch model with pytorch dataloaders just fine, but spliting my pytorch dataset with keras.utils.split_dataset fails because I don't have tensorflow installed.

Comment From: sonali-kumari1

Hi @oluwandabira -

Could you please provide a standalone code and Pytorch dataset you are trying to split with keras.utils.split_dataset to replicate this issue. Thanks!

Comment From: oluwandabira

import os
os.environ["KERAS_BACKEND"] = "torch"
import torch
import keras

class Dataset(torch.utils.data.Dataset):
    def __init__(self, len : int):
        super().__init__()
        self._len = len

    def __len__(self):
        return self._len

    def __getitem__(self, index):
        return torch.rand((224, 224, 3)), torch.rand(())


dataset = Dataset(10)

train, test = keras.utils.split_dataset(dataset, left_size=0.7)

pip freeze


absl-py==2.1.0
contourpy==1.3.1
cycler==0.12.1
filelock==3.17.0
fire==0.7.0
fonttools==4.56.0
fsspec==2025.2.0
h5py==3.13.0
Jinja2==3.1.5
joblib==1.4.2
keras==3.8.0
kiwisolver==1.4.8
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.10.0
mdurl==0.1.2
ml_dtypes==0.5.1
mpmath==1.3.0
namex==0.0.8
networkx==3.4.2
numpy==2.2.3
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
optree==0.14.0
packaging==24.2
pillow==11.1.0
Pygments==2.19.1
pyparsing==3.2.1
python-dateutil==2.9.0.post0
rich==13.9.4
scikit-learn==1.6.1
scipy==1.15.2
six==1.17.0
sympy==1.13.1
termcolor==2.5.0
threadpoolctl==3.5.0
torch==2.6.0
torchvision==0.21.0
triton==3.2.0
typing_extensions==4.12.2

Running the above code in the above environment results in a ImportError: This requires the tensorflow module. You can install it via 'pip install tensorflow'

Comment From: sonali-kumari1

Hi @oluwandabira -

Thanks for sharing the reproducible code. We will look into it and update you soon.

Comment From: sonali-kumari1

Hi @oluwandabira -

The error ImportError: This requires the tensorflow module. You can install it via 'pip install tensorflow'occurs because split_dataset function is trying to apply tensorflow-specific operations like tf.data.Dataset.from_tensor_slices and after splitting, it returns a tuple of two tf.data.Dataset objects even if you are using torch.utils.data.Dataset. To resolve this error, you can either install tensorflow or use Pytorch's random_split function. Thanks!

Comment From: innat

@sonali-kumari1 IMO, this should be backend agnostic. The split method should not use tf ops with torch backend, instead it should have backend specific splitting method.

https://github.com/keras-team/keras-hub/issues/2128

Comment From: sibyjackgrove

I feel this issue is connected to #21009 It seems that even when we set the backend to something other than TensorFlow, there are still dependencies on TensorFlow for some Keras features.