open (not in any particular order)
- add support for other dtypes in table columns (datetime,date,unicode)
- Implement variable length strings in a parallel VLArray (and synchronize): https://github.com/PyTables/PyTables/issues/198
- revisit Term syntax - can we do better / more readability?
3a. implement
or
in Terms (maybe use pyparsing like syntax) - implement WORMTable
- one big area is to test whether data columns really are slower; it thus may make sense to make data columns = True the default (but not necessarily index them). see https://groups.google.com/forum/m/?fromgroups#!topic/pydata/cmw1F3OFJSc - see the end of this post for some perf tests, so this is prob not a good idea after all
- add
export
function, to export to different PyTables formats(an easy to read table for R (partially done), and output a GenericTable) - provide better access to columns that are data_columns (as we can directly select them) - see
read_column
, expand this to the entire table (if possible), allows one to avoid selecting all columns in a table (and then reindexing), this works ifcolumns
argument is provided to select or inferred from the where. - add out-of-core computation support (see my comment about 1/2 down in #622), this is partially supported now that we have an iterator (#3078)
- add a method to create a table structure (
create_table
)?, w/o actually appending, so don't have to add parms in each call to append. - Support a better mechanism for table splitting
Splitter
? that a user can specify how to split (rather than a dict); then store this object, so can automatically recreate the resulting table (enable for both Storer and Table objects) - Optimize table appending, I think we can do better! (GH #3537) makes some improvements
- allow
itemsize='truncate'
to allow subsquent appends to proceed with string truncation (on specific columns) - allow where in
select_column
, return a properly indexed Series, add option to include the index (use_index=True?
) - Better deal with a very long list as input to a
Term
, but running multipleor
sub-queries - Add support for coulumn oriented tables, dep is
carray
, http://carray.pytables.org/docs/manual/
done
- DONE (GH #2401): access store paths via path notation / dot notation (GH #2755)
- DONE (GH #2497): add to docs (GH #2397) - issues about reading/writing concurrently in threads/processes http://sourceforge.net/mailarchive/message.php?msg_id=30190886
- DONE (GH #2497): support panelnd (GH #2242)
- DONE (GH #2561): Should DataFrames be automagically indexed on 'index' (prob yes), but then should have a flag in append/put, and enable passing of the indexing options
- DONE (GH #2497): Check if create_table_index changes the current index if different options are passed
- DONE (GH #2561): for writing add chunk keyword to select to provide generator like behavior - each call to return the next chunk of data
- DONE (GH #2561): support multi indexes on tables 5a. DONE real dtype integration is coming on PR #2708 (eg even though 0.10.1 will actually read/write float32 columns u can't really do much with them w/o having them upcasted) - in any event I think HDFStore will accommodate this already. but more testing needed
- DONE iterator support in
select
, http://stackoverflow.com/questions/14614512/merging-two-tables-with-millions-of-rows-in-python (GH #3078) - DONE (GH #3531) support timezones in datelike columns (index should be ok already) (scott?), (GH #2852)
Comment From: gerigk
what about allowing creation/access of groups by using "/" in the key.
i.e.,
store.put('some/path/to/df', df)
would create/access the groups some, path, to and finally df.
Right now I can only save the data on one level within an hdf5 file although HDF5/PyTables supports access by file system like paths. It would not break anything since the occurrence of a '/' raises an exception right now.
On Thu, Nov 29, 2012 at 6:20 PM, jreback notifications@github.com wrote:
- add support for other dtypes in table columns (datetime64,datetime,date,unicode)
- support min_itemsize for table columns (currently supported only in indexers) also might be a better way of doing this (e.g. have the info attached to a dataframe, or support a global pandas option to provide a minimum)
- revisit Term syntax - can we do better / more readability?
- implement WORMTable
— Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/2391.
Comment From: jreback
good idea...shouldn't be too hard to implement
Comment From: scottkidder
Here are things that are most interesting/beneficial to my current workload:
Full Float32 support & full pandas dtype support WORMTable (unsure of implementation or performance gains) data_columns is very useful and I can do more testing to determine how fast/slow they are. **read_column would also be very useful in many instances.
I like the way Term's work. Is there support for ORing Terms or other logical operations in the Selection?
I can pick up work on any of these issues, but I would absolutely to like to discuss some of the details first.
Comment From: jreback
Scott send me an email and I'll send u offline so we can correspond jeff@reback.net
Comment From: alvorithm
Term language: perhaps it makes sense to piggyback on existing syntax. SQL comes to mind, but also XESAM (whole http://xesam.org is down at the time, but one can get the gist of it here: http://banshee.fm/support/guide/searching/.
Comment From: alvorithm
It would be nice if attribute access (e.g. store.df
) could be enabled for all the leaves that have suitable names. This might require a big API overhaul, though (store.df.append
...).
Comment From: jreback
see #2485, this is actually somewhat easy in HDFStore, the problem is that pandas in general doesnt' propogate these attributes; you can easily store/retrieve attributes if you want on the nodes themselves
something like:
s = store.get_storer('df')
s.attrs['my_attribute'] = 1
Comment From: jreback
sorry...misundestood your comment....(though you meant saving attributes)
attribute access on the store is not a big deal, will add to the list
Comment From: alvorithm
Thank you for considering this, dotted access will save my pinky a lot of strain ['']
(dead keys b/c need accents...).
Regarding attributes on DFs actually this would preempt a number of cases for specialization of DataFrame (see recent MetaDataFrame PR #2695) and in particular perhaps support the addition for metadata that would facilitate automated merges (foreign keys...).
EDIT: there was a discussion about this topic in the mailing list
Comment From: jreback
see #2755 , was pretty easy to add dotted access, so i did!
Comment From: jreback
@scottkidder did you get a chance to look at issue 13. #2852
Comment From: jreback
dated