Pandas PERF: cythonize vectorized string routines

Not a night and day improvement since all we're doing is removing some python overhead, but there does seem to be 2x+ performance to be picked up. Possibly could use some of the template machinery to make these easy to write.

I wouldn't consider this high priority given long term plans to replace the string dtype, but could be worth it.

import cython
%load_ext cython

s = pd.Series(np.random.choice(['aaaaaaaaaa', 'bbbbbbbb', 'ccccc' ,
                                'dddd'], size=20000).astype('O'))

%%cython
from numpy cimport *
import numpy as np

def fast_upper(ndarray values):
    cdef:
        Py_ssize_t i, n = values.shape[0]
        ndarray output = np.empty_like(values)
        str val
    for i in range(n):
        val = values[0]
        output[i] = val.upper()
    return output

%timeit s.str.upper()
100 loops, best of 3: 4.94 ms per loop

%timeit pd.Series(fast_upper(s.values), index=s.index)
100 loops, best of 3: 2.02 ms per loop


**Comment From: jreback**

you can actually get even better perf by using c-functions and maybe even release the GIL (though this is a bit trickier code).

**Comment From: jreback**

xref to #4694 

**Comment From: chris-b1**

Yeah, looks like the cythonization isn't really what's helping in my example, it's the avoidance of na checks.

In [27]: %timeit pd.Series([x.upper() for x in s], index=s.index) 100 loops, best of 3: 2.74 ms per loop ```

Comment From: jorisvandenbossche

Now users have the option to use the Arrow-backed string dtype if they want better performance, it might not be needed to keep this issue open?

Comment From: jbrockmendel

I agree with Joris, closing as "supported via pyarrow"