Pandas STYLE use TypeAlias and upgrade ruff

The latest autoupdate job removes some useful type aliases definitions

As noted by @twoertwein , this is because we haven't annotated them as TypeAlias

So, the task here is: 1. pre-commit autoupdate 2. add

  # Useless statement
  "B018",

under Additional checks that don't pass yet in pyproject.toml

check which files were updated in https://github.com/pandas-dev/pandas/pull/52345/files . For any place where the type alias definition was replaced with ..., you should annotate the variable with TypeAlias

Here's an example of the kind of change you'd be making:

diff --git a/pandas/_libs/interval.pyi b/pandas/_libs/interval.pyi
index 4c36246e04..f0412b8397 100644
--- a/pandas/_libs/interval.pyi
+++ b/pandas/_libs/interval.pyi
@@ -3,6 +3,7 @@ from typing import (
     Generic,
     TypeVar,
     overload,
+    TypeAlias,
 )

 import numpy as np
@@ -16,9 +17,9 @@ from pandas._typing import (

 VALID_CLOSED: frozenset[str]

-_OrderableScalarT = TypeVar("_OrderableScalarT", int, float)
-_OrderableTimesT = TypeVar("_OrderableTimesT", Timestamp, Timedelta)
-_OrderableT = TypeVar("_OrderableT", int, float, Timestamp, Timedelta)
+_OrderableScalarT: TypeAlias = TypeVar("_OrderableScalarT", int, float)
+_OrderableTimesT: TypeAlias = TypeVar("_OrderableTimesT", Timestamp, Timedelta)
+_OrderableT: TypeAlias = TypeVar("_OrderableT", int, float, Timestamp, Timedelta)

 class _LengthDescriptor:
     @overload

run pre-commit run ruff --all-files, and check that no type alias definitions are overwritten by ...
run pre-commit run --all-files, check everything passes
stage, commit, open pull request

See the contributing guide for how to get started https://pandas.pydata.org/docs/dev/development/contributing.html

Comment From: sjdex

Hi @MarcoGorelli , I was working on this. Came across this no_default: Final = ... . Do I need to replace Final with TypeAlias in such case, not sure we can append it with Final.

Comment From: MarcoGorelli

hey @sjdex - I think that one's fine to become as ..., as it's already defined here

https://github.com/pandas-dev/pandas/blob/64eed4c30d73fe6e5e43f3d6e1e16e05bd1a38f3/pandas/_libs/lib.pyx#L2758

likewise cache_readonly

it's only the ones which look like they're type aliases (which aren't defined in other files) which we want to TypeAlias to

Comment From: twoertwein

Hi @MarcoGorelli , I was working on this. Came across this no_default: Final = ... . Do I need to replace Final with TypeAlias in such case, not sure we can append it with Final.

You can test it by running pre-commit run -av ruff :) (after upgrading ruff to the latest version).

I think could have something like Final[Typing Alias] but I'm not sure whether that is valid/correct. I believe the value of this enum cannot be replaced with .... If you have a pandas development environment, it might be worth running mypy and pyright: https://pandas.pydata.org/pandas-docs/stable/development/contributing_codebase.html#validating-type-hints after you make changes to no_default.

Comment From: yusharth

@sjdex Hey, is there any scope for improvement or help I can provide you to close this issue altogether? I am new to open source and seek your guidance. Let me know if I can help in anyway.

Comment From: aswinj18

Hi @MarcoGorelli, Is this issue still open. If so can you assign it to me?

Comment From: snorfyang

@MarcoGorelli, Is anyone still working on this issue? If no I can solve the problem.

Comment From: MarcoGorelli

there's an open PR

Comment From: MarcoGorelli

Hi @snorfyang - looks like the original PR has gone stale, so if you're interested in working on this, please feel free to open a new PR! thanks 🙏

Comment From: snorfyang

take

Comment From: yusharth

@snorfyang Can I help you with this issue?

Comment From: snorfyang

I noticed that ruff has updated and after I runned pre-commit run ruff --all-files, there was only one place replaced by ... even if I do not add TypeAlias annotations. But E501 problem still exists. https://github.com/pandas-dev/pandas/blob/3790452d17da4e18d86f12426753f3151c26c94b/pandas/_libs/lib.pyi#L33 So do we really need to add TypeAlias annotations? And there some other problems with pre-commit run --all-files: 1. the codespell hook failed: ```pandas/tests/groupby/test_counting.py:260: delt ==> dealt pandas/tests/groupby/test_counting.py:262: delt ==> dealt pandas/tests/groupby/test_counting.py:263: delt ==> dealt pandas/core/frame.py:5153: lama ==> llama pandas/core/frame.py:5169: lama ==> llama pandas/core/frame.py:5180: lama ==> llama pandas/core/frame.py:5189: lama ==> llama pandas/core/series.py:2228: lama ==> llama pandas/core/series.py:2230: lama ==> llama pandas/core/series.py:2232: lama ==> llama pandas/core/series.py:2241: lama ==> llama pandas/core/series.py:2253: lama ==> llama pandas/core/series.py:4869: lama ==> llama pandas/core/series.py:4881: lama ==> llama

2. the check-test-naming hook failed:

Traceback (most recent call last): File "E:\anaconda3\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "E:\anaconda3\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\Documents\repo\pandas\scripts\check_test_naming.py", line 150, in ret |= main(content, file) File "E:\Documents\repo\pandas\scripts\check_test_naming.py", line 122, in main _content = fd.read() UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 2358: illegal multibyte sequence Traceback (most recent call last): File "E:\anaconda3\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "E:\anaconda3\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\Documents\repo\pandas\scripts\check_test_naming.py", line 150, in ret |= main(content, file) File "E:\Documents\repo\pandas\scripts\check_test_naming.py", line 122, in main _content = fd.read() UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 2358: illegal multibyte sequence Traceback (most recent call last): File "E:\anaconda3\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "E:\anaconda3\lib\runpy.py", line 86, in _run_code exec(code, run_globals) UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 2358: illegal multibyte sequence Traceback (most recent call last): File "E:\anaconda3\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "E:\anaconda3\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\Documents\repo\pandas\scripts\check_test_naming.py", line 150, in ret |= main(content, file) File "E:\Documents\repo\pandas\scripts\check_test_naming.py", line 122, in main _content = fd.read() UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 2358: illegal multibyte sequence

3. the ruff hook failed (E501 as mentioned above):
```Found 1 error (1 fixed, 0 remaining).
pandas\tests\io\excel\test_readers.py:846:80: E501 Line too long (96 > 88 characters)
pandas\tests\io\formats\test_format.py:632:73: E501 Line too long (101 > 88 characters)
pandas\tests\io\formats\test_format.py:644:73: E501 Line too long (92 > 88 characters)
pandas\tests\io\formats\test_format.py:656:73: E501 Line too long (101 > 88 characters)
pandas\tests\io\formats\test_format.py:668:73: E501 Line too long (101 > 88 characters)
pandas\tests\io\formats\test_format.py:683:68: E501 Line too long (107 > 88 characters)
pandas\tests\io\formats\test_format.py:701:73: E501 Line too long (101 > 88 characters)
pandas\tests\io\formats\test_format.py:769:74: E501 Line too long (105 > 88 characters)
pandas\tests\io\formats\test_format.py:783:73: E501 Line too long (96 > 88 characters)
pandas\tests\io\formats\test_format.py:797:74: E501 Line too long (105 > 88 characters)
pandas\tests\io\formats\test_format.py:811:74: E501 Line too long (105 > 88 characters)
pandas\tests\io\formats\test_format.py:826:70: E501 Line too long (111 > 88 characters)
pandas\tests\io\formats\test_format.py:844:74: E501 Line too long (105 > 88 characters)
pandas\tests\io\formats\test_format.py:889:74: E501 Line too long (96 > 88 characters)
pandas\tests\io\formats\test_format.py:2208:79: E501 Line too long (96 > 88 characters)
pandas\tests\io\formats\test_format.py:2213:79: E501 Line too long (96 > 88 characters)
pandas\tests\io\formats\test_format.py:2217:72: E501 Line too long (100 > 88 characters)
pandas\tests\io\formats\test_format.py:2219:69: E501 Line too long (103 > 88 characters)
pandas\tests\io\formats\test_format.py:2225:70: E501 Line too long (115 > 88 characters)
pandas\tests\io\formats\test_format.py:2287:79: E501 Line too long (91 > 88 characters)
pandas\tests\io\formats\test_format.py:2295:79: E501 Line too long (91 > 88 characters)
pandas\tests\io\formats\test_format.py:2303:73: E501 Line too long (104 > 88 characters)
pandas\tests\io\formats\test_format.py:2383:77: E501 Line too long (96 > 88 characters)
pandas\tests\io\parser\test_encoding.py:230:88: E501 Line too long (89 > 88 characters)
pandas\tests\io\xml\test_xml.py:457:58: E501 Line too long (105 > 88 characters)
pandas\tests\io\xml\test_xml.py:461:62: E501 Line too long (93 > 88 characters)
typings\numba.pyi:1:1: PYI033 Don't use type comments in stub file
Found 27 errors.

cc @MarcoGorelli

Comment From: snorfyang

@snorfyang Can I help you with this issue?

Sure. But I think this issue needs to be discussed.

Comment From: MarcoGorelli

you can just split the lines https://github.com/pandas-dev/pandas/pull/52452#issuecomment-1499281176

for the test-naming hook, I think we just need to specify encoding='utf8' in

https://github.com/pandas-dev/pandas/blob/1b653b1c23f7a5f9fbaede993cf9370dba79b1fc/scripts/check_test_naming.py#L121

for codespell, do you have a different version? it's green in ci

Comment From: snorfyang

you can just split the lines #52452 (comment)

for the test-naming hook, I think we just need to specify encoding='utf8' in

https://github.com/pandas-dev/pandas/blob/1b653b1c23f7a5f9fbaede993cf9370dba79b1fc/scripts/check_test_naming.py#L121

for codespell, do you have a different version? it's green in ci

I think in CI pre-commit only checks modified files. But pre-commit run --all-files will check all files so it failed on my device. If I just run pre-commit then it passes. Maybe a new issue?

Comment From: MarcoGorelli

it runs on all files in CI

Comment From: MarcoGorelli

anyway, feel free to open a PR, can always take a look there

Comment From: snorfyang

Fine. Use a container but those hooks still failed. I don't understand why they pass CI. Maybe a version problem as I updated some hooks with pre-commit autoupdate. I'll make a PR later to handle those.