Pandas API revert inference of resolution when parsing string in Timestamp?

Should #49737 be reverted?

This was brought in the dev call yesterday. It's great that non-nanosecond support is coming along, but it's still not complete and there are still some bugs. Hence, it probably shouldn't happen by default yet, but rather should be opt-in.

Arguably:

In [2]: Timestamp(np.datetime64('2000', 's')).unit
Out[2]: 's'

counts as opt-in, because the user had to go through a numpy object which already had a set resolution. But perhaps it's not the case for

In [7]: Timestamp('2000').unit
Out[7]: 's'

, where users would have been used to receiving 'ns'.

This would be fine once non-nano support is complete, but currently would introduce bugs - e.g. of a fairly common operation I came across this week: #50656. I think @jorisvandenbossche also had an example of this breaking the arrow CI (though from a quick look I couldn't find it - could you please share it?)

Comment From: phofl

cc @jbrockmendel

Comment From: jbrockmendel

Is it the Timestamp object or the Series etc having non-nano dtype causing troubles?

If we revert inference on strings, do we also revert support for strings that are out of bounds for nanos? Or do inference only on those cases? IIRC joris and i agreed a while back that the latter was undesirable.

xref #49076 which might alleviate some of the changiness for the Timestamp object. (not a First Issue; I have a branch that stalled out)

Comment From: jorisvandenbossche

The example from the pyarrow CI where I noticed this, is the change of the return value of pd.Timestamp("2022-01-01").value, which is now a number of seconds instead of a number of nanoseconds, i.e. the https://github.com/pandas-dev/pandas/issues/49076 issues you linked. (this was only in a test for checking the expected result, so that is easy to fix on our side (although we also use .value in actual conversion code, I have to check if that is affected), but in general that is a breaking change)

Is it the Timestamp object or the Series etc having non-nano dtype causing troubles?

I think both. Although I suppose the only aspect about the Timestamp object that might affect users is the .value attribute? (https://github.com/pandas-dev/pandas/issues/49076) But, if this causes your Series to get a non-nano dtype, that does have a wider impact.

Comment From: MarcoGorelli

I only noticed issues when it's part of a DatetimeIndex - and that wouldn't be affected, its reso would still be 'ns':

In [2]: ts = Timestamp('2000')

In [3]: ts.unit
Out[3]: 's'

In [4]: to_datetime([ts])
Out[4]: DatetimeIndex(['2000-01-01'], dtype='datetime64[ns]', freq=None)

If the only issue with Timestamp is .value, then I'm not sure that this would be a strong-enough reason to revert - the suggestion in https://github.com/pandas-dev/pandas/issues/49076#issuecomment-1279265403 looks fine

Comment From: jorisvandenbossche

But, if this causes your Series to get a non-nano dtype, that does have a wider impact.

Ah, I was assuming that we would use the unit of the Timestamp object to determine the dtype when creating a Series or column. But if the above is not the case, then it's indeed only .value that has user-facing impact at this moment.

Comment From: jbrockmendel

Ah, I was assuming that we would use the unit of the Timestamp object to determine the dtype when creating a Series or column

In the relevant cases (object dtype or sequence-without-dtype) we go through array_to_datetime, which is the last big place to implement non-nano support. Not coincidentally, it is the most difficult. It is straightforward to add a reso keyword to array_to_datetime, but doing inference there is tough bc inferred resos might not be homogeneous.

Comment From: MarcoGorelli

resolved in https://github.com/pandas-dev/pandas/issues/49076