Skip to content

DOC: DataFrame.to_feather() does not accept *all* file-like objects #62057

@qris

Description

@qris

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_feather.html

Documentation problem

Since #35408, the method docs say:

path str, path object, file-like object
String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function.

And indeed it does work with a BytesIO buffer or an open file:

with open('/tmp/foo', 'wb') as handle:
    df.to_feather(handle)

But not with other file-like objects, such as an AsyncWriter from hdfs.InsecureClient.write():

with self.client.write(self._path(name)) as writer:
    df.to_feather(writer)

Traceback (most recent call last):
  File "/home/chris/ram-system/.venv/lib/python3.10/site-packages/pyarrow/feather.py", line 186, in write_feather
    _feather.write_feather(table, dest, compression=compression,
AttributeError: 'AsyncWriter' object has no attribute 'closed'
ValueError: I/O operation on closed file

I note that it's not actually supposed to work: pyarrow.feather.write_feather says:

deststr
Local destination path.

Which says nothing about file-like objects being acceptable. It does seem to have some special cases for handling buffers specifically, but this is undocumented and could change at any time.

I think that write_feather insists on checking the closed attribute of the passed handle, which this one doesn't have. It seems to work if I poke such an attribute onto the object, but it could easily stop working.

Also I know about hdfs.ext.dataframe.write_dataframe for this particular use case, but it only supports Avro which is not a great file format for DataFrames, and there are likely to be other file-like objects that people might try to pass to to_feather().

Similarly, read_feather claims to accept:

pathstr, path object, or file-like object
String, path object (implementing os.PathLike[str]), or file-like object implementing a binary read() function.

But read() is not enough:

  File "pyarrow/_feather.pyx", line 79, in pyarrow._feather.FeatherReader.__cinit__
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
io.UnsupportedOperation: seek

Suggested fix for documentation

I think it's better to describe these functions as officially taking only strings (URLs and paths) and mmap objects. File-like objects currently work but this is not guaranteed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions