Bug description When using TikaDocumentReader, I often encountered the following error:

java
Caused by: java.lang.StackOverflowError: null  
    at java.base/java.util.regex.Pattern$Caret.match(Pattern.java:3896)  
    at java.base/java.util.regex.Pattern$Curly.match1(Pattern.java:4597)  
    at java.base/java.util.regex.Pattern$Curly.match(Pattern.java:4546)  
    at java.base/java.util.regex.Pattern$Dollar.match(Pattern.java:3996)  
    at java.base/java.util.regex.Pattern$Caret.match(Pattern.java:3906)  
    at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4969).  

After debugging, I found that the issue lies in the non-optimized regular expression from the trimAdjacentBlankLines() method in the ExtractedTextFormatter class. In my case, the problem occasionally occurred even with a files with small number of empty lines (~150) with default VM stack settings -Xmx8192m.

Steps to reproduce For testing the occurrence of this error, I created an XLSX file with a large number of empty rows. I am attaching it to this issue. Stack_overflow_exception_test.xlsx