open (not in any particular order)

  1. add support for other dtypes in table columns (datetime,date,unicode)
  2. Implement variable length strings in a parallel VLArray (and synchronize): https://github.com/PyTables/PyTables/issues/198
  3. revisit Term syntax - can we do better / more readability? 3a. implement or in Terms (maybe use pyparsing like syntax)
  4. implement WORMTable
  5. one big area is to test whether data columns really are slower; it thus may make sense to make data columns = True the default (but not necessarily index them). see https://groups.google.com/forum/m/?fromgroups#!topic/pydata/cmw1F3OFJSc - see the end of this post for some perf tests, so this is prob not a good idea after all
  6. add export function, to export to different PyTables formats(an easy to read table for R (partially done), and output a GenericTable)
  7. provide better access to columns that are data_columns (as we can directly select them) - see read_column, expand this to the entire table (if possible), allows one to avoid selecting all columns in a table (and then reindexing), this works if columns argument is provided to select or inferred from the where.
  8. add out-of-core computation support (see my comment about 1/2 down in #622), this is partially supported now that we have an iterator (#3078)
  9. add a method to create a table structure (create_table)?, w/o actually appending, so don't have to add parms in each call to append.
  10. Support a better mechanism for table splitting Splitter? that a user can specify how to split (rather than a dict); then store this object, so can automatically recreate the resulting table (enable for both Storer and Table objects)
  11. Optimize table appending, I think we can do better! (GH #3537) makes some improvements
  12. allow itemsize='truncate' to allow subsquent appends to proceed with string truncation (on specific columns)
  13. allow where in select_column, return a properly indexed Series, add option to include the index (use_index=True?)
  14. Better deal with a very long list as input to a Term, but running multiple or sub-queries
  15. Add support for coulumn oriented tables, dep is carray, http://carray.pytables.org/docs/manual/

done

  1. DONE (GH #2401): access store paths via path notation / dot notation (GH #2755)
  2. DONE (GH #2497): add to docs (GH #2397) - issues about reading/writing concurrently in threads/processes http://sourceforge.net/mailarchive/message.php?msg_id=30190886
  3. DONE (GH #2497): support panelnd (GH #2242)
  4. DONE (GH #2561): Should DataFrames be automagically indexed on 'index' (prob yes), but then should have a flag in append/put, and enable passing of the indexing options
  5. DONE (GH #2497): Check if create_table_index changes the current index if different options are passed
  6. DONE (GH #2561): for writing add chunk keyword to select to provide generator like behavior - each call to return the next chunk of data
  7. DONE (GH #2561): support multi indexes on tables 5a. DONE real dtype integration is coming on PR #2708 (eg even though 0.10.1 will actually read/write float32 columns u can't really do much with them w/o having them upcasted) - in any event I think HDFStore will accommodate this already. but more testing needed
  8. DONE iterator support in select, http://stackoverflow.com/questions/14614512/merging-two-tables-with-millions-of-rows-in-python (GH #3078)
  9. DONE (GH #3531) support timezones in datelike columns (index should be ok already) (scott?), (GH #2852)

Comment From: gerigk

what about allowing creation/access of groups by using "/" in the key.

i.e.,

store.put('some/path/to/df', df)

would create/access the groups some, path, to and finally df.

Right now I can only save the data on one level within an hdf5 file although HDF5/PyTables supports access by file system like paths. It would not break anything since the occurrence of a '/' raises an exception right now.

On Thu, Nov 29, 2012 at 6:20 PM, jreback notifications@github.com wrote:

  1. add support for other dtypes in table columns (datetime64,datetime,date,unicode)
  2. support min_itemsize for table columns (currently supported only in indexers) also might be a better way of doing this (e.g. have the info attached to a dataframe, or support a global pandas option to provide a minimum)
  3. revisit Term syntax - can we do better / more readability?
  4. implement WORMTable

— Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/2391.

Comment From: jreback

good idea...shouldn't be too hard to implement

Comment From: scottkidder

Here are things that are most interesting/beneficial to my current workload:

Full Float32 support & full pandas dtype support WORMTable (unsure of implementation or performance gains) data_columns is very useful and I can do more testing to determine how fast/slow they are. **read_column would also be very useful in many instances.

I like the way Term's work. Is there support for ORing Terms or other logical operations in the Selection?

I can pick up work on any of these issues, but I would absolutely to like to discuss some of the details first.

Comment From: jreback

Scott send me an email and I'll send u offline so we can correspond jeff@reback.net

Comment From: alvorithm

Term language: perhaps it makes sense to piggyback on existing syntax. SQL comes to mind, but also XESAM (whole http://xesam.org is down at the time, but one can get the gist of it here: http://banshee.fm/support/guide/searching/.

Comment From: alvorithm

It would be nice if attribute access (e.g. store.df) could be enabled for all the leaves that have suitable names. This might require a big API overhaul, though (store.df.append ...).

Comment From: jreback

see #2485, this is actually somewhat easy in HDFStore, the problem is that pandas in general doesnt' propogate these attributes; you can easily store/retrieve attributes if you want on the nodes themselves

something like:

s = store.get_storer('df')
s.attrs['my_attribute'] = 1

Comment From: jreback

sorry...misundestood your comment....(though you meant saving attributes)

attribute access on the store is not a big deal, will add to the list

Comment From: alvorithm

Thank you for considering this, dotted access will save my pinky a lot of strain [''] (dead keys b/c need accents...).

Regarding attributes on DFs actually this would preempt a number of cases for specialization of DataFrame (see recent MetaDataFrame PR #2695) and in particular perhaps support the addition for metadata that would facilitate automated merges (foreign keys...).

EDIT: there was a discussion about this topic in the mailing list

Comment From: jreback

see #2755 , was pretty easy to add dotted access, so i did!

Comment From: jreback

@scottkidder did you get a chance to look at issue 13. #2852

Comment From: jreback

dated