Description

Consider the 3D array

a = array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

which i wish to convert into a pandas DataFrame, so that each dimension gets mapped onto a column name. Viz,

    A  B  C  val
0   0  0  0    0
1   0  0  1    1
2   0  0  2    2
3   0  1  0    3
4   0  1  1    4
5   0  1  2    5
6   1  0  0    6
7   1  0  1    7
8   1  0  2    8
9   1  1  0    9
10  1  1  1   10
11  1  1  2   11

Note that in practice the array a could have be very large. What's the "pandas" way to do this ? At the moment this is what I'm doing:

def ndarray2dataframe(arr, dim_names=None, dim_values=None,
                      target_dim_name="val", keep_dtypes=False):
    """Converts an ndarray of shape (d_1,...,d_n) into a dataframe with n + 1
    columns containing thesame number of rows as there are elements in the
    input array, and for all index tuples (i_1, ..., i_n) in
    [0...d_1] x ... x [0...d_n], we have

        arr[i_1,...,i_n] == df.loc[mask].iloc[0]["val"],

    where msk := (df[df.columns[0]] == i_1) * ... * (df.columns[n] == i_n)

    Examples
    --------

    In [50]: utils.ndarray2dataframe(arange(2 * 2 * 3).reshape((2, 2, 3)))
    Out[50]:
        0  1  2  val
    0   0  0  0    0
    1   0  0  1    1
    2   0  0  2    2
    3   0  1  0    3
    4   0  1  1    4
    5   0  1  2    5
    6   1  0  0    6
    7   1  0  1    7
    8   1  0  2    8
    9   1  1  0    9
    10  1  1  1   10
    11  1  1  2   11
    """
    # Misc
    arr = np.array(arr)
    if dim_names is None:
        dim_names = range(arr.ndim)
    else:
        assert len(dim_names) == arr.ndim
    columns = list(dim_names) + [target_dim_name]
    if dim_values is None:
        dim_values = map(range, arr.shape)
    else:
        assert map(len, dim_values) == list(arr.shape)

    # Form meshgrid with matricial indexing="ij" (N.B.: "xy" indexing screws
    # everything up)
    grid = np.meshgrid(*dim_values, indexing="ij", copy=False)

    # Explode everything
    data = np.reshape(arr, grid[0].shape).ravel().tolist()
    grid = list(map(np.ravel, grid))
    data = grid + [data]
    df = pd.DataFrame(np.transpose(data), columns=columns)

    # Pandas thinks it's smarter than us :)
    df[target_dim_name] = df[target_dim_name].astype(arr.dtype)
    if keep_dtypes:
        for column, these_dim_values in zip(dim_names, dim_values):
            df[column] = df[column].astype(type(these_dim_values[0]))
    return df

I just wanted to be sure I wasn't reinventing the wheel.

Comment From: jreback

In [6]: Series(a.ravel(), pd.MultiIndex.from_product([ range(n) for n in a.shape ]))
Out[6]: 
0  0  0     0
      1     1
      2     2
   1  0     3
      1     4
      2     5
1  0  0     6
      1     7
      2     8
   1  0     9
      1    10
      2    11
dtype: int64

though not sure if this actually generalizes to your problem. SO might be a better forum for things like this.