Two problems: I still need a threshold for the run length (I arbitrarily chose 10), and my implementation is slow. But I coded up the same algorithm in Python (which ran ~instantly), and it turns out my heuristic is very effective: most steps give a value ≤4, and then the correct step gives 79.