Research
-
[X] I have searched the [pandas] tag on StackOverflow for similar questions.
-
[X] I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/57442034/how-to-set-the-default-styler-for-a-pandas-dataframes-repr-html-method
Question about pandas
This is a question/feature request:
So I have a class that has a DataFrame
attribute, which contains long strings (descriptions) and urls. Hence, I want to specify how to format the HTML representation by default for usage in Jupyter Notebooks.
All I could find was this old StackOverflow thread from 2019, which says it is not possible.
Has this changed by now?
I tried to do the following, but it resulted in an attribute AttributeError: can't set attribute 'style'
. Is there any particular reason why df.style
cannot be simply replaced by another Styler object?
import pandas as pd
d = [
(
0,
"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor "
"incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis "
"nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. "
"Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore "
"eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, "
"sunt in culpa qui officia deserunt mollit anim id est laborum.",
"https://en.wikipedia.org/wiki/Lorem_ipsum",
)
]
df = pd.DataFrame(d, columns=["paragraph", "text", "url"]).set_index("paragraph")
style = df.style.set_properties(
**{
"inline-size": "10cm",
"overflow-wrap": "break-word",
},
subset=["text"],
).format({"url": lambda x: f'<a href="{x}">{x}</a>'})
df.style = style # ✘ AttributeError: can't set attribute 'style'
Comment From: jbrockmendel
Cc @attack68
Comment From: attack68
df.style
is really a method that takes the DataFrame
and constructs a basic Styler
object from it. You should not be creating a Styler
object to then try to re-associate it back to df.style
. This serves no purpose after the object is already created. Every call to df.style
returns a different Styler
object which is not linked to the DataFrame
. That is why Styler
is documented to be a display purpose tool used after the dataframe has been pre-processed and constructed.
>>> s1 = df.style
>>> s2 = df.style
>>> print(id(s1))
>>> print(id(s2))
140594825868960
140594825868144
To redefine the basic df.style
operation you must redefine the DataFrame
class property.
from pandas import DataFrame
def custom_style(self):
from pandas.io.formats.style import Styler
return Styler(self).applymap(lambda x: "color: green; font-weight: bold;")
DataFrame.style = property(custom_style)
df = DataFrame([[1,2],[3,4]])
df.style
Comment From: randolf-scholz
@attack68 Thanks for the reply. Here's what I still don't get: why doesn't style implement a setter method? It seems like a thing that you'd really like to have (as per my use-case, for example).
Your proposed solution changes the behaviour of the whole class, but I only want to change it for an instance. I had a look at the source code, and essentially we have:
@property
def style(self) -> Styler:
return Styler(self)
What I wonder is couldn't we have instead:
@property
def style(self) -> Styler:
if self._styler is None:
return Styler(self) # or possibly initialize self._styler = Styler(self) during __init__
return self._styler
@style.setter
def style(self, styler: Styler) -> None:
self._styler = styler
Comment From: attack68
That would be slightly confusing because the default is reactive to changes in the DataFrame:
df = DataFrame([[1,2],[3,4]])
df.style
df.iloc[0,0] = 100
df.style # <- is different to above
whilst your application using a setter
df = DataFrame([[1,2],[3,4]])
df.style = df.style
df.iloc[0,0] = 100
df.style # <- this styler has been set and no longer reacts to the DataFrame update.
For indexing dataframes i.e. to get different cross sectional views you need to create different DataFrames which require different Styler
objects so 'setting' it won't help there either. I don't really understand what you can't achieve or what you think will be easier.
You can also create a special instance that can be repeatedly used throughout the Notebook:
from pandas import DataFrame
class MyDataFrame(DataFrame):
@property
def style(self):
from pandas.io.formats.style import Styler
return Styler(self).applymap(lambda x: "color: green; font-weight: bold;")
df = MyDataFrame([[1,2],[3,4]])
df.style
Comment From: randolf-scholz
Oh, so the Styler classes do not lazily evaluate the data? They parse everything during Styler.__init__
? Ok, I get it that would be a big blocker. I falsely assumed that the actual evaluation would happen during Styler.__repr__
or Styler.__str__
, etc.
I don't really understand what you can't achieve or what you think will be easier.
Again, as per my example I have a class with an attached DataFrame.
class Model:
VARIANTS: DataFrame
And I want a nice representation when the user accesses the attribute in a live-session (Jupyter)
Model.VARIANTS # <-- should produce nicely formatted html
At the same time the DataFrame holds data like urls that get used by the class. So the attribute needs to be a DataFrame whereas for nice printing it seems I need the Styler object. This seems cumbersome and confusing to me. Again, I was under the apparently wrong impression that the DataFrame.style
would overwrite DataFrame.__repr__
and DataFrame._repr_*_
methods.
This kind of feels like having to carry around the data and style for an excel table in two separate files.
Comment From: attack68
Oh, so the Styler classes do not lazily evaluate the data? They parse everything during
Styler.__init__
? Ok, I get it that would be a big blocker. I falsely assumed that the actual evaluation would happen duringStyler.__repr__
orStyler.__str__
, etc.
That is correct and with good reason, some of Stylers
functionality is based on knowing the data when methods are run. Not all, but some, and therefore this creates this requirement, currently.