Keras BlackDuck reports arbitrary file write vulnerability in "get_file" function

Duplicate of #20795 but the original issue has already been closed without resolution. The issue is still present in our most recent BlackDuck scans.

According to the GitHub security advisory the issue does not originate from Palo Alto Firewalls, but from the "get_file" functionality in Keras. Here is the vulnerable code (according to GH): https://github.com/keras-team/keras/blob/8f5592bcb61ff48c96560c8923e482db1076b54a/keras/src/utils/file_utils.py#L115

Here is the CVE: https://www.cve.org/CVERecord?id=CVE-2024-55459

It's hard to verify the vulnerability since the original write-up is not accessible any more. If you require further information I will give my best to provide it. Any feedback would be appreciated (acknowledgement of the issue or confirmation of a false positive).

Thank you very much in advance!

Comment From: zdfowler

While true that "thou shalt not" use untrusted upstream data sources, without some more digging I'm not sure why the call to filter_safe_paths() isn't already a mitigation here, for the CVE-2024-55459's Path Traversal / Write Arbitrary Write issue

Might be useful to add the data filter for extractall() for more defense in depth. Available 3.12+. Starting in 3.14 the 'data' filter will be used by default.

See https://docs.python.org/3/library/tarfile.html#tarfile-extraction-filter, using the data option See also https://docs.python.org/3/library/tarfile.html#hints-for-further-verification

Sample in reference:

if hasattr(tarfile, 'data_filter'):
    my_tarfile.extractall(filter='data')
else:
    # remove this when no longer needed
    warn_the_user('Extracting may be unsafe; consider updating Python')
    my_tarfile.extractall()

Without more information or proof of exploit it's hard to say if this was a false positive or just someone trolling for extractall references.

I'm not a maintainer, but I am also dealing with my SCA tool throwing an alert that needed review. So call me an interested party :)

Comment From: mattdangerw

@hertschuh you were already looking at this right?

Comment From: mattdangerw

Took a look, I agree this looks like it may not be an issue? Or at least a naive exploit will not work. Made a tarfile with some ".." paths to escape the dir, these paths will successfully be ignored during extraction ->

https://colab.research.google.com/gist/mattdangerw/6b86708329245670de5bfbcfd5bd52f8/unsafe-tar.ipynb

If there is some other way to craft a tarfile that exploits our extraction code we'd need some form of reproduction to fix it (and much appreciated if so).

Thanks @zdfowler very much for the help!

Comment From: mattdangerw

@grimsi take a look at the colab link above, if there's another form of exploit let us know, but this looks to be a false positive to me.

Comment From: mattdangerw

I also tried this on python 3.9 without triggering an exploit, but it's possible I am not constructing the malicious tarfile correctly.

Comment From: grimsi

Thanks for the investigation @mattdangerw !

You can mark it as false positive in this case.

Comment From: zdfowler

I also tried creating tarfiles with absolute references, tested subdir/../../../../etc/passwd style references, etc. Aside from creating a tar-bomb (which would be a DoS vuln not arbitrary file write vuln), all of the same protections apply.

An attacker could get malicious files propagated to disk using this function, but those files all get pushed to local working folder and not to arbitrary places on the system. I think the warning in code in file_utils.py:201 speaks to this issue already.

https://github.com/keras-team/keras/blob/8f5592bcb61ff48c96560c8923e482db1076b54a/keras/src/utils/file_utils.py#L201

FWIW - I was not able to see the warnings or errors when attempting to re-run the notebook with a new reference to /etc/passwd in file_structure. See https://colab.research.google.com/gist/zdfowler/249d8ba110c136ac1fe1e41f0446011c/unsafe-tar.ipynb. In my re-run, those files were not skipped. (No warnings.) They were extracted to local subdirs. Again - path traversal risk was not present.

Thanks all!