When passing in an ExtensionArray
column into the formatting code, we always do a cast with numpy.asarray(…)
and then re-enter the if-clauses in format_array
. This enables us to re-use some of the formatting code that already is there for the existing columns but also is problematic for columns that are slightly different than the existing ones but can coerce to them.
In my current example I have a FletcherContinuousDtype(timestamp[us])
with values that are not representable in the nanosecond range (e.g. year 3000 or year 9999, typical dates used in traditional DB setups for the very far future). It is passed into ExtensionArrayFormatter
then transformed into an np.ndarray[datetime64[us]]
and then passed again into format_array
where it steps into the Datetime64Formatter
. There a forceful cast to nanoseconds is done through https://github.com/pandas-dev/pandas/blob/01f73100d5f7b942a796ffd000962dee28b43f9c/pandas/io/formats/format.py#L1438-L1439
We could either fix this by:
- The specific approach: Somehow allow non-nanoseconds timestamps in the
Datetime64Formatter
- The general approach: For an ExtensionArray the formatting should be
fmt_values = [values._formatter(x) for x in self.values]
Comment From: jorisvandenbossche
I think the second approach is the best way (or a variant of it).
First, we should not need to do np.asarray(EA)
for formatting (because we have no guarantee it does not alter the scalar values you get from that numpy array compard to the scalar values from iterating through EA directly). And then not creating a numpy array, will also avoid re-using one of the other built-in formatters that might not be suitable.
Comment From: jbrockmendel
@xhochy we've added non-nano support since the OP. Any chance that fixed the problem?