Is your feature request related to a problem?
Currently class_column
is a required argument for pandas.plotting.parallel_coordinates
, which means a hack/workaround is needed to create a parallel coordinates plot for a single class (i.e. with a single colour).
Describe the solution you'd like
class_column
should default to None
, in which case a parallel coordinates plot with a single colour will be created.
I've looked at the source code for the relevant function and I think it would be a fairly straightforward modification.
API breaking implications
I don't think this would cause any backwards-incompatible changes, the arguments to the function would stay the same and have the same order (only with class_column
now being optional).
It may be preferable to allow the color
argument to take a single color value when class_column
is None
(currently color
is optional but expected to be a list of colors if defined).
Describe alternatives you've considered
It's currently possible to hack a single-coloured parallel coordinates plot by creating a dummy class_column
that has a constant value, e.g.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.uniform(size=(10, 3)), columns=["a", "b", "c"])
df["label"] = -1
pd.plotting.parallel_coordinates(df, "label")
plt.gca().get_legend().remove()
plt.show()
What I'm proposing is for this to be possible instead:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.uniform(size=(10, 3)), columns=["a", "b", "c"])
pd.plotting.parallel_coordinates(df)
plt.show()
Additional context
I found someone else asking a question about this in this issue: https://github.com/pandas-dev/pandas/issues/12341#issuecomment-299911662 . The response was along the lines of class_column
being required because the general use-case for parallel coordinate plots is multivariate data. That may be true, but it's valid to want to create one for a single class and I don't think the API should force the plot to have a colour scale. Cases where creating a single class plot may be useful:
- Constructing a plot from multiple data sources rather than a single data frame (i.e. multiple parallel coordinates plots on the same axis).
- Exploratory analysis in clustering problems (where the classes are unknown)
Comment From: jack89roberts
These functions also have a required class_column
argument:
- pandas.plotting.andrews_curves
- pandas.plotting.radviz
I'm not as familiar with those styles of plot but the same argument may apply (and making it optional for those should be similarly straightforward and not a breaking change).
Comment From: cbbcbail
There is no reason why parallel coordinate plots must be multi-class.
Comment From: brurosa
I would like to take this
Comment From: peacheym
Has this issue been resolved?