A novel approach to visualize clinical benefit of therapies for chronic graft versus host disease (cGvHD): the probability of being in response (PBR) applied to the REACH3 study – Bone Marrow Transplantation

Although probability of being in response was considered as a useful method to assess response over time in the statistical literature during the last decades, very few applications can be found in clinical research. In one recent paper, Huang et al. used PBR (referred to as PBIR by Huang et al.) to compare different treatments for renal cell carcinoma [7].

In this post-hoc analysis, we applied PBR to the REACH3 study data to show the benefit of this method when assessing efficacy of cGvHD treatments. PBR provides easily interpretable curves presenting simultaneously the time from treatment start to first response and subsequent failure based on all randomized patients. Results obtained in REACH3 clearly illustrate the superiority of ruxolitinib versus BAT, further confirming the results reported in the original study publication [2]. PBR offers a visual comparison of efficacy over time between treatment arms based on all patients and the entire study period. In contrast, DOR estimates time from first response and visualizes duration of response for the subgroup of patients who responded to treatment only, which can result in a biased assessment of treatment benefit (for example, if a higher percentage of patients reaches the response state in the experimental than in the control arm).

The multistate model and the resulting estimated PBR function considered in this paper were defined in alignment with the definition of efficacy endpoints as pre-specified in the REACH3 study protocol. The design of this multistate model (PBR function), allowing transition from state 0 = not in response to state 1 = in response but NOT state 1 to state 0, as well as the definition of events for the end of response are based on exactly the same criteria as REACH3 efficacy endpoints.

In future work or for other studies, one could extend the model, or alternatively define events differently than done here for REACH3. In particular, as determined in the study protocol, patients were counted as responders only if the first response occurred up to week 24. However, first response to treatment may occur after week 24, i.e., patients who did not respond up to week 24 do not necessarily need to enter the absorbing state 2 at week 24. Furthermore, loss of response was aligned with the definition of DOR, i.e., once a patient enters state 1 = in response, the patient can either stay in that state until the analysis cut-off date or can lose the “in response” status by entering the absorbing state 2, thus ending the duration of response. However, in diseases such as cGvHD, it may also be meaningful to extend the model by allowing transitions from “in response” (state 1) back to “not in response” (state 0) before entering the absorbing state 2. If for instance a patient achieves an overall response of PR by improvement of cGvHD symptoms in several organs at week 8, but subsequently one organ worsened at week 12 (with response maintained in the other organs) and improved again at week 16 without having changed systematic cGvHD treatment, it would be reasonable to assign state 0 = not in response, state 1 = in response, state 0 = not in response and state 1 = in response at study start, week 8, week 12 and week 16, respectively. PBR could be applied to such model extensions.

One of the reasons why applications of PBR can hardly be found in the clinical literature may be the lack of statistical software to perform these analyses. Recently, Xiadong et al. provided the R-package PBIR which can be easily applied within the open-source software R [8]. Due to the special situation that cGvHD first response to treatment in REACH3 was counted up to week 24, we have generated our own R-codes and used PBIR R-package for validation purposes only (PBIR would have cut the curves at week 24).

It would also be useful to have a formal statistical test for comparing the treatments with respect to PBR. We do not elaborate on this topic here for two reasons. Firstly, the difference of PBR curves including pointwise 95% confidence intervals (Fig. 4) allows a good visual comparison between treatment arms, and provides sufficient evidence that the difference between the curves is statistically significant. Secondly, to our knowledge, a statistical test providing a direct generalization of the usual log-rank test is not yet available for the situation described in this paper; its development is subject of forthcoming work. Considering a parametric estimation of PBR curves using the exponential distribution, Ellis et al. proposed a statistical test by comparing the area under the PBR curves (referred to as expected duration of response, EDoR) between treatment arms [5]. Other alternatives (such as a log-rank test of time in response from entering the response state until leaving it for the absorbing state, potentially setting time in response to 0 for those patients who went from state 0 straight to state 2) are conceivable, but a thorough discussion of their properties, interpretational restrictions and precise relation with the PBR curve is beyond the scope of this paper.

As illustrated with the data of REACH3 we strongly believe that PBR can serve as a meaningful efficacy endpoint for the assessment of cGvHD treatments, in addition to ORR/BOR and failure-free survival (FFS) which are recommended as endpoints for clinical trials by the NIH clinical design working group [9]. Compared to these established endpoints PBR provides a more comprehensive summary of treatment efficacy because it integrates several aspects of the treatment benefit (time-to-response, duration of response) into a single measure. Whereas time is not taken into account for ORR/BOR, FFS describes the time to treatment failure only but does neither assess if patients respond to treatment nor the time to response. Further clinical input would be required to include PBR into an updated cGvHD response guideline, also to ensure that consistent criteria are applied across future clinical trials. For example, a clear definition of response duration and end of response would be required. The current guidelines postulate to ‘document durability of response and to determine whether continued treatment is needed to maintain response’ and state that ‘Efforts to document the durability of response are strongly encouraged’, but do not provide clear definitions of response durability [1, 9]. Finally, PBR represents a useful endpoint measure which could be applied for all diseases and indications, for which clinical benefit is assessed by response to treatment in the context of time, demonstrating further utility outside of the cGvHD treatment landscape.