Research

  • [X] I have searched the [pandas] tag on StackOverflow for similar questions.

  • [X] I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/57442034/how-to-set-the-default-styler-for-a-pandas-dataframes-repr-html-method

Question about pandas

This is a question/feature request:

So I have a class that has a DataFrame attribute, which contains long strings (descriptions) and urls. Hence, I want to specify how to format the HTML representation by default for usage in Jupyter Notebooks.

All I could find was this old StackOverflow thread from 2019, which says it is not possible.

Has this changed by now?

I tried to do the following, but it resulted in an attribute AttributeError: can't set attribute 'style'. Is there any particular reason why df.style cannot be simply replaced by another Styler object?

import pandas as pd

d = [
    (
        0,
        "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor "
        "incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis "
        "nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. "
        "Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore "
        "eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, "
        "sunt in culpa qui officia deserunt mollit anim id est laborum.",
        "https://en.wikipedia.org/wiki/Lorem_ipsum",
    )
]
df = pd.DataFrame(d, columns=["paragraph", "text", "url"]).set_index("paragraph")
style = df.style.set_properties(
    **{
        "inline-size": "10cm",
        "overflow-wrap": "break-word",
    },
    subset=["text"],
).format({"url": lambda x: f'<a href="{x}">{x}</a>'})

df.style = style  # ✘ AttributeError: can't set attribute 'style'

Comment From: jbrockmendel

Cc @attack68

Comment From: attack68

df.style is really a method that takes the DataFrame and constructs a basic Styler object from it. You should not be creating a Styler object to then try to re-associate it back to df.style. This serves no purpose after the object is already created. Every call to df.style returns a different Styler object which is not linked to the DataFrame. That is why Styler is documented to be a display purpose tool used after the dataframe has been pre-processed and constructed.

>>> s1 = df.style
>>> s2 = df.style
>>> print(id(s1))
>>> print(id(s2))
140594825868960
140594825868144

To redefine the basic df.style operation you must redefine the DataFrame class property.

from pandas import DataFrame

def custom_style(self):
    from pandas.io.formats.style import Styler
    return Styler(self).applymap(lambda x: "color: green; font-weight: bold;")

DataFrame.style = property(custom_style)

df = DataFrame([[1,2],[3,4]])
df.style

Screenshot 2022-12-15 at 23 08 46

Comment From: randolf-scholz

@attack68 Thanks for the reply. Here's what I still don't get: why doesn't style implement a setter method? It seems like a thing that you'd really like to have (as per my use-case, for example).

Your proposed solution changes the behaviour of the whole class, but I only want to change it for an instance. I had a look at the source code, and essentially we have:

    @property
    def style(self) -> Styler:
        return Styler(self)

What I wonder is couldn't we have instead:

    @property
    def style(self) -> Styler:
        if self._styler is None:
            return Styler(self)   # or possibly initialize self._styler = Styler(self) during __init__
        return self._styler

    @style.setter
    def style(self, styler: Styler) -> None:
        self._styler = styler

Comment From: attack68

That would be slightly confusing because the default is reactive to changes in the DataFrame:

df = DataFrame([[1,2],[3,4]])
df.style
df.iloc[0,0] = 100
df.style # <- is different to above

whilst your application using a setter

df = DataFrame([[1,2],[3,4]])
df.style = df.style
df.iloc[0,0] = 100
df.style # <- this styler has been set and no longer reacts to the DataFrame update.

For indexing dataframes i.e. to get different cross sectional views you need to create different DataFrames which require different Styler objects so 'setting' it won't help there either. I don't really understand what you can't achieve or what you think will be easier.

You can also create a special instance that can be repeatedly used throughout the Notebook:

 from pandas import DataFrame

class MyDataFrame(DataFrame):

    @property
    def style(self):
        from pandas.io.formats.style import Styler
        return Styler(self).applymap(lambda x: "color: green; font-weight: bold;")

df = MyDataFrame([[1,2],[3,4]])
df.style

Comment From: randolf-scholz

Oh, so the Styler classes do not lazily evaluate the data? They parse everything during Styler.__init__? Ok, I get it that would be a big blocker. I falsely assumed that the actual evaluation would happen during Styler.__repr__ or Styler.__str__, etc.

I don't really understand what you can't achieve or what you think will be easier.

Again, as per my example I have a class with an attached DataFrame.

class Model:
    VARIANTS: DataFrame

And I want a nice representation when the user accesses the attribute in a live-session (Jupyter)

Model.VARIANTS  # <-- should produce nicely formatted html

At the same time the DataFrame holds data like urls that get used by the class. So the attribute needs to be a DataFrame whereas for nice printing it seems I need the Styler object. This seems cumbersome and confusing to me. Again, I was under the apparently wrong impression that the DataFrame.style would overwrite DataFrame.__repr__ and DataFrame._repr_*_ methods.

This kind of feels like having to carry around the data and style for an excel table in two separate files.

Comment From: attack68

Oh, so the Styler classes do not lazily evaluate the data? They parse everything during Styler.__init__? Ok, I get it that would be a big blocker. I falsely assumed that the actual evaluation would happen during Styler.__repr__ or Styler.__str__, etc.

That is correct and with good reason, some of Stylers functionality is based on knowing the data when methods are run. Not all, but some, and therefore this creates this requirement, currently.