-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_feather.html
Documentation problem
Since #35408, the method docs say:
path str, path object, file-like object
String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function.
And indeed it does work with a BytesIO buffer or an open file:
with open('/tmp/foo', 'wb') as handle:
df.to_feather(handle)
But not with other file-like objects, such as an AsyncWriter
from hdfs.InsecureClient.write()
:
with self.client.write(self._path(name)) as writer:
df.to_feather(writer)
Traceback (most recent call last):
File "/home/chris/ram-system/.venv/lib/python3.10/site-packages/pyarrow/feather.py", line 186, in write_feather
_feather.write_feather(table, dest, compression=compression,
AttributeError: 'AsyncWriter' object has no attribute 'closed'
ValueError: I/O operation on closed file
I note that it's not actually supposed to work: pyarrow.feather.write_feather says:
deststr
Local destination path.
Which says nothing about file-like objects being acceptable. It does seem to have some special cases for handling buffers specifically, but this is undocumented and could change at any time.
I think that write_feather
insists on checking the closed
attribute of the passed handle, which this one doesn't have. It seems to work if I poke such an attribute onto the object, but it could easily stop working.
Also I know about hdfs.ext.dataframe.write_dataframe for this particular use case, but it only supports Avro which is not a great file format for DataFrames, and there are likely to be other file-like objects that people might try to pass to to_feather()
.
Similarly, read_feather claims to accept:
pathstr, path object, or file-like object
String, path object (implementing os.PathLike[str]), or file-like object implementing a binary read() function.
But read()
is not enough:
File "pyarrow/_feather.pyx", line 79, in pyarrow._feather.FeatherReader.__cinit__
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
io.UnsupportedOperation: seek
Suggested fix for documentation
I think it's better to describe these functions as officially taking only strings (URLs and paths) and mmap objects. File-like objects currently work but this is not guaranteed.