Spring Divide RAM usage by 5 for ResultSets from ColumnMapRowMapper

In our application, we heavily use procedures returning cursors/ResultSets. We found a big unjustified RAM usage causing problems in production.

It is caused by ColumnMapRowMapper: mapOfColumnValues is a LinkedCaseInsensitiveMap. The LinkedCaseInsensitiveMap of each row has a caseInsensitiveKeys map whose keys are key.toLowerCase(getLocale()). Here lies the problem: for each data row, all the column names are duplicated (lower-cased) in memory.

This is huge: we often go OutOfMemoryError in exports, or when a lot of concurrent users are reading ResultSets at the same time.

We found we could divide RAM by a factor of about 2,2 in our case, by just avoiding all these duplicate Strings.

And to go further, each mapped row has an HashMap. Out of curiosity, we replaced the HashMap with a simple Object[], using the LinkedCaseInsensitiveMap as a Map. Here again, we gained an additional factor of about 2,2 in RAM urage.

By using both optimizations, we have divided by about 5 the RAM usage of ResultSet mapped by Spring. This is a huge benefit for our users, experiencing no OutOfMemory anymore, and able to do more work in parallel. We think this solution is optimal: there is no more optimization that could be done do reduce memory further (while keeping the code readable and decoupled).

Here is the code we used. I let you inspect it. Do you like the solution? If yes, you can refactor it to your liking/coding-standards if needed before incorporating it:

...
    // Only for us not to use our own Spring Jdbc version
    @Bean
    public JdbcTemplate jdbcTemplate(DataSource dataSource) {
        return new RamEfficientJdbcTemplate(dataSource);
    }
}

public class RamEfficientJdbcTemplate extends JdbcTemplate {
    public RamEfficientJdbcTemplate(DataSource dataSource) {
        super(dataSource);
    }
    // Only for us not to use our own Spring Jdbc version: could be integrated into the main Spring Jdbc codebase:
    protected RowMapper<Map<String, Object>> getColumnMapRowMapper() {
        return new RamEfficientColumnMapRowMapper();
    }
}

/**
 * {@link RowMapper} implementation similar to {@link ColumnMapRowMapper} but in a memory-efficient way.
 * There is no String duplication of all lower-cased column names in each row (halving the RAM usage) by sharing the
 * keyIndices among all rows.
 * And it avoids unnecessary Maps (also halving the remaining RAM usage) by using simple arrays for each row.
 */
public class RamEfficientColumnMapRowMapper implements RowMapper<Map<String, Object>> {

    private Map<String, Integer> keyIndices;

    @Override
    public Map<String, Object> mapRow(ResultSet resultSet, int rowNum) throws SQLException {
        ResultSetMetaData metaData = resultSet.getMetaData();
        int columnCount = metaData.getColumnCount();

        buildKeyIndices(metaData, columnCount);
        return toValuesMap(resultSet, columnCount);
    }

    private void buildKeyIndices(ResultSetMetaData metaData, int columnCount) throws SQLException {
        if (keyIndices == null) {
            keyIndices = new LinkedCaseInsensitiveMap<>(columnCount);
            for (int index = 0; index < columnCount; index++) {
                int position = index + 1;
                String column = JdbcUtils.lookupColumnName(metaData, position);
                keyIndices.put(getColumnKey(column), index);
            }
        }
    }

    private Map<String, Object> toValuesMap(ResultSet resultSet, int columnCount) throws SQLException {
        Object[] values = new Object[columnCount];
        for (int index = 0; index < columnCount; index++) {
            int position = index + 1;
            values[index] = getColumnValue(resultSet, position);
        }

        return new RamEfficientMap<>(keyIndices, values);
    }

    protected String getColumnKey(String columnName) {
        return columnName;
    }

    @Nullable
    protected Object getColumnValue(ResultSet rs, int index) throws SQLException {
        return JdbcUtils.getResultSetValue(rs, index);
    }

}

public class RamEfficientMap<K, V> implements Map<K, V> {

    private final Map<K, Integer> keyIndices;
    private final V[] values;

    public RamEfficientMap(Map<K, Integer> keyIndices, V[] values) {
        if (keyIndices.size() != values.length) {
            throw new IllegalArgumentException("There are " + keyIndices.size() + " keys but " + values.length + " values");
        }

        this.keyIndices = keyIndices;
        this.values = values;
    }

    @Override
    public int size() {
        return values.length;
    }

    @Override
    public boolean isEmpty() {
        return false;
    }

    @Override
    public boolean containsKey(Object key) {
        return keyIndices.containsKey(key);
    }

    @Override
    public boolean containsValue(Object searchedValue) {
        for (V value : values) {
            boolean areBothNull = searchedValue == null && value == null;
            boolean areBothEqual = searchedValue != null && searchedValue.equals(value);
            if (areBothNull || areBothEqual) {
                return true;
            }
        }
        return false;
    }

    @Override
    public V get(Object key) {
        Integer index = keyIndices.get(key);
        return index == null ? null : values[index];
    }

    @Override
    public V put(K key, V value) {
        return throwImmutableMap();
    }

    @Override
    public V remove(Object key) {
        return throwImmutableMap();
    }

    @Override
    public void putAll(Map<? extends K, ? extends V> map) {
        throwImmutableMap();
    }

    @Override
    public void clear() {
        throwImmutableMap();
    }

    @Override
    public Set<K> keySet() {
        return unmodifiableSet(keyIndices.keySet());
    }

    @Override
    public Collection<V> values() {
        return unmodifiableList(asList(values));
    }

    @Override
    public Set<Entry<K, V>> entrySet() {
        return keyIndices.keySet().stream()
                .map(this::toEntry)
                .collect(toSet());
    }

    private Entry<K, V> toEntry(K key) {
        return new SimpleImmutableEntry<>(key, get(key));
    }

    private V throwImmutableMap() {
        throw new UnsupportedOperationException("The map is immutable");
    }

}

Comment From: slaout

Note: we ran into a problem where a ResultSet contained twice the same column name.

Here is a fixed version of the RamEfficientColumnMapRowMapper that keeps the same behavior than before (do not crash, do not even prevent the user he/she did a mistake and a column will be silently ignored). There is one (potentially breaking) change: we take the first column of each name, and not the last, as before: this is to behave more consistently with ResultSet's documented behaviour.

public class RamEfficientColumnMapRowMapper implements RowMapper<Map<String, Object>> {

    /**
     * columnKey => index in the values array (starting from 0).
     * Eg. if ResultSet returns columns COL_1, COL1, COL_3 => keyIndices = { COL_1: 0, COL_3: 2 }
     */
    private Map<String, Integer> keyIndices;

    /**
     * position of a key in the ResultSet (starting from 1) => index in the values array (starting from 0).
     * Eg. if ResultSet returns columns COL_1, COL1, COL_3 (positions 1, 2, 3) => positionsToIndices = { 1: 0, 3: 2 }
     */
    private Map<Integer, Integer> positionsToIndices;

    @Override
    public Map<String, Object> mapRow(ResultSet resultSet, int rowNum) throws SQLException {
        parseMetaDataOnce(resultSet.getMetaData());
        return toValuesMap(resultSet);
    }

    private void parseMetaDataOnce(ResultSetMetaData metaData) throws SQLException {
        if (keyIndices == null) {
            parseMetaData(metaData);
        }
    }

    private void parseMetaData(ResultSetMetaData metaData) throws SQLException {
        int columnCount = metaData.getColumnCount();

        keyIndices = new LinkedCaseInsensitiveMap<>(columnCount);
        positionsToIndices = new HashMap<>(columnCount);

        int index = 0;
        for (int position = 1; position <= columnCount; position++) {
            String columnName = JdbcUtils.lookupColumnName(metaData, position);
            String columnKey = getColumnKey(columnName);

            // Match the ResultSet behaviour:
            // "When a getter method is called with a column name and several columns have the same name,
            // the value of the first matching column will be returned."
            // https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSet.html
            if (keyIndices.containsKey(columnKey)) {
                continue;
            }

            keyIndices.put(columnKey, index);
            positionsToIndices.put(position, index);

            index++;
        }
    }

    private Map<String, Object> toValuesMap(ResultSet resultSet) throws SQLException {
        int keyCount = positionsToIndices.size();
        Object[] values = new Object[keyCount];
        for (Map.Entry<Integer, Integer> positionAndIndex : positionsToIndices.entrySet()) {
            int position = positionAndIndex.getKey();
            int index = positionAndIndex.getValue();
            values[index] = getColumnValue(resultSet, position);
        }
        return new RamEfficientMap<>(keyIndices, values);
    }

    protected String getColumnKey(String columnName) {
        return columnName;
    }

    @Nullable
    protected Object getColumnValue(ResultSet rs, int index) throws SQLException {
        return JdbcUtils.getResultSetValue(rs, index);
    }

}

Comment From: slaout

Note: sorry for all this code in the issue: the goal is for me to know your thoughts about such changes before investing time into creating a proper pull request. If you are OK with the proposed solution and you prefer, I can create a pull request instead.

Another "breaking" change is that the current ColumnMapRowMapper is returning a mutable Map: my solution implies an immutable one. This should generally be OK, unless developers using the returned map add some new fake columns or cache values in them, compute derived virtual columns, etc.

Then, we can either pull the change in a major version, or support this use-case by having an additional regular HashMap instantiated only when developers call the put() method (and associated ones). The RAM penalty would then only occurs in such unusual cases (but I'm sure, vital for people doing such stuff).

Comment From: slaout

Hello, I know you are busy, but do you have any thought about this topic?

Do you think it's important to reduce memory usage because you should support any use case of your users, however big their ResultSet is?

Or do you consider big ResultSets to be niche use cases not worth considering?

Either way, it's fine, I'm just curious.

Or if I can do something else to help, let me know.

Comment From: jhoeller

My take here is that our ColumnMapRowMapper is intentionally somewhat simplistic, a simple adapter for exposing each row as a LinkedCaseInsensitiveMap by design.

For maximum efficiency, I'd rather recommend a RowMapper implementation that exposes application-specific data objects, possibly even record instances (as of JDK 17). Our DataClassRowMapper can serve for such purposes, or even more efficiently a custom RowMapper implementation that builds such instances programmatically and also calls the ResultSet without any consultation of metadata.

From that perspective, it's not a high priority to make ColumnMapRowMapper more efficient on our end, in particular not at the cost of higher maintenance and possible breakage of existing scenarios. Using a more sophisticated implementation with JdbcTemplate the way you did looks like a fine way out if you'd like to hang on to exposing Map instances per row. For such application-specific scenarios, there's also no need to worry about backwards compatibility in terms of immutability etc.

We can leave this issue open for feedback from further stakeholders, or possibly for simple optimizations within ColumnMapRowMapper and/or LinkedCaseInsensitiveMap that we can do without backwards compatibility concerns (even in the 5.3.x line).

Comment From: jhoeller

Since there are no immediate plans for ColumnMapRowMapper on our side, I'll rather close this issue. As per my advice above, I recommend a custom RowMapper implementation for specific target classes for optimal efficiency.