Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
example_df = pd.DataFrame(
{'event_name': ['a', 'b'],
'event_time': [pd.Timestamp("2020-01-01T03:00:00-07:00"), pd.Timestamp("2020-01-01T04:00-07:00")]})
example_df.head()
event_name event_time
0 a 2020-01-01 03:00:00-07:00
1 b 2020-01-01 04:00:00-07:00
example_df[['event_time']].values # fine when DataFrame preserved
array([[Timestamp('2020-01-01 03:00:00-0700', tz='pytz.FixedOffset(-420)')],
[Timestamp('2020-01-01 04:00:00-0700', tz='pytz.FixedOffset(-420)')]],
dtype=object)
example_df['event_time'].values # strips timezone when accessed as a Series
array(['2020-01-01T10:00:00.000000000', '2020-01-01T11:00:00.000000000'],
dtype='datetime64[ns]')
### Issue Description
When a column of timezone-aware timestamps are accessed as a Series (i.e. pulled out with single brackets) the timezone information is stripped. This doesn't happen when the column is accessed as a DataFrame (i.e. pulled out with double brackets).
### Expected Behavior
`example_df['event_time'].values`
`array([[Timestamp('2020-01-01 03:00:00-0700', tz='pytz.FixedOffset(-420)')],
[Timestamp('2020-01-01 04:00:00-0700', tz='pytz.FixedOffset(-420)')]],
dtype=object)`
### Installed Versions
<details>
INSTALLED VERSIONS
------------------
commit : f2c8480af2f25efdbd803218b9d87980f416563e
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-1022-aws
Version : #26~20.04.1-Ubuntu SMP Sat Oct 15 03:22:07 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.3
numpy : 1.19.2
pytz : 2022.2
dateutil : 2.8.1
pip : 21.0.1
setuptools : 52.0.0.post20211006
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 3.0.3
IPython : 8.3.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None
</details>
**Comment From: jreback**
you have a really old version
try with 1.5.2
**Comment From: MarcoGorelli**
thanks for the report - if you use `to_numpy()` you'll get
In [3]: example_df['event_time'].to_numpy() Out[3]: array([Timestamp('2020-01-01 03:00:00-0700', tz='pytz.FixedOffset(-420)'), Timestamp('2020-01-01 04:00:00-0700', tz='pytz.FixedOffset(-420)')], dtype=object) ```
closing for now then
.values
is discouraged now, check https://pandas.pydata.org/docs/reference/api/pandas.Series.values.html
Comment From: ericbarnhill
Problem replicated with Pandas 1.5.2 . However best resolution as Marco indicates is to switch to .array rather than .values . I will update my code to start using this attribute. Thank you!!