It is very common to read random rows in a large csv file, typically for testing with a small dataset, or fit the limit of memory. The parameter nrows
is used for read the first n lines, but I didn't find any feature to read random lines. Such parameter might be named keeprows
(opposite to skiprows
), which supports:
- int, e.g. keeprows=100
means keep 100 random lines (uniformly)
- float in (0, 1), e.g. keeprows=0.05
means keep 5% of total lines
- list of int(or iterable), e.g. keeprows=[1, 3, 8]
mean to keep line 1, 3, and 8
Comment From: jreback
duplicate of #3132
yes, this is a good idea.