A Heisenbug caused by Elastic's refresh mechanism
Recently I experienced an interesting heisenbug while writing an integration test that uses Elasticsearch. The test spun up an elastic container, created a test index and indexed some data. Then, my tests would run aggregations against the index and, after some post-processing on Elastic's response, assert the results. The setup was very simple.
public abstract class AbstractAggregationTest {
public static final String IMAGE =
"docker.elastic.co/elasticsearch/elasticsearch:7.9.3";
private static final RestHighLevelClient restHighLevelClient;
protected static final ElasticsearchContainer ELASTICSEARCH_CONTAINER;
static {
ELASTICSEARCH_CONTAINER = new ElasticsearchContainer(IMAGE);
ELASTICSEARCH_CONTAINER.withReuse(true);
ELASTICSEARCH_CONTAINER.start();
final String elasticEndpoint = ELASTICSEARCH_CONTAINER
.getHttpHostAddress();
restHighLevelClient = new RestHighLevelClient(RestClient
.builder(HttpHost.create(elasticEndpoint)));
setupIndex();
indexTestData();
}
private static void setupIndex() {
// Creates the index
}
private static void indexTestData() {
// Builds an BulkRequest and indexes the data
}
}
The problem was that every time I ran the test, the post-processor would report different numbers, so assertions would fail. Interestingly, adding a breakpoint right after the test data got indexed would solve the problem! I checked Elastic's response, and there lied the problem. Elastic sent out a different response every time I ran the test. Not only that, but a count of the documents in the index would also report different numbers. Adding a small wait time after indexing would also solve the problem.
static {
// ... testcontainer setup
setupIndex();
indexTestData();
countDocsInIndex(); // <-- changes every time
Thread.sleep(1000);
countDocsInIndex(); // <-- correct number
}
That's when I remembered about elastic's refresh mechanism. This is the mechanism built on top of Lucene's caching. You can read more about refresh on Elastic's guide Near real-time search. So the solution, was simply forcing a refresh after indexing the data and before prociding with the tests.
static {
// ... testcontainer setup
setupIndex();
indexTestData();
// Make sure the indexed documents are readily available
restHighLevelClient
.indices()
.refresh(new RefreshRequest(TEST_INDEX), DEFAULT));
}