Apostolof's Blog

Nothing important going on here.

A Heisenbug caused by Elastic's refresh mechanism

Recently I experienced an interesting heisenbug while writing an integration test that uses Elasticsearch. The test spun up an elastic container, created a test index and indexed some data. Then, my tests would run aggregations against the index and, after some post-processing on Elastic's response, assert the results. The setup was very simple.

public abstract class AbstractAggregationTest {

    public static final String IMAGE =
            "docker.elastic.co/elasticsearch/elasticsearch:7.9.3";
    private static final RestHighLevelClient restHighLevelClient;
    protected static final ElasticsearchContainer ELASTICSEARCH_CONTAINER;

    static {
        ELASTICSEARCH_CONTAINER = new ElasticsearchContainer(IMAGE);
        ELASTICSEARCH_CONTAINER.withReuse(true);
        ELASTICSEARCH_CONTAINER.start();

        final String elasticEndpoint = ELASTICSEARCH_CONTAINER
                .getHttpHostAddress();
        restHighLevelClient = new RestHighLevelClient(RestClient
                .builder(HttpHost.create(elasticEndpoint)));

        setupIndex();
        indexTestData();
    }

    private static void setupIndex() {
        // Creates the index
    }

    private static void indexTestData() {
        // Builds an BulkRequest and indexes the data
    }
}

The problem was that every time I ran the test, the post-processor would report different numbers, so assertions would fail. Interestingly, adding a breakpoint right after the test data got indexed would solve the problem! I checked Elastic's response, and there lied the problem. Elastic sent out a different response every time I ran the test. Not only that, but a count of the documents in the index would also report different numbers. Adding a small wait time after indexing would also solve the problem.

static {
    // ... testcontainer setup

    setupIndex();
    indexTestData();
    countDocsInIndex(); // <-- changes every time

    Thread.sleep(1000);
    countDocsInIndex(); // <-- correct number
}

That's when I remembered about elastic's refresh mechanism. This is the mechanism built on top of Lucene's caching. You can read more about refresh on Elastic's guide Near real-time search. So the solution, was simply forcing a refresh after indexing the data and before prociding with the tests.

static {
    // ... testcontainer setup

    setupIndex();
    indexTestData();

    // Make sure the indexed documents are readily available
    restHighLevelClient
            .indices()
            .refresh(new RefreshRequest(TEST_INDEX), DEFAULT));
}