Design
For a cohort of all new small-molecule drugs (i.e., New Molecular Entities [NMEs]) approved by the FDA between July 1997 and May 2020 (23.4 years), we tracked new indication exclusivities granted, and for the subset of drugs that experienced generic entry during the observation period, we analyzed the relationship between the number of years since/to generic entry and the number of new indications granted.
Data sources
Our chief data source was electronic archives of the FDA’s Approved Drug Products with Therapeutic Equivalence Evaluations (the “Orange Book”) covering July 1997 through May 2020 [23]. All new indications that have been added on the basis of a new clinical investigation by the drug’s manufacturer receive an exclusivity. These datasets included the timing and type of new indication exclusivity (i.e., a 3-year New Indication exclusivity or a 7-year Orphan Drug Exclusivity) as well as which products the exclusivities were protecting. The Orange Book also provides all small-molecule drugs’ active ingredients, trade names, manufacturers, FDA approval dates, and whether the products are brand name or generic. To distinguish which brand-name drugs were NMEs, we linked to the Drugs@FDA Data Files [24] so that all subsequently approved brand-name drugs (i.e., new formulations) could be grouped with their parent products.
New variables
To study the association between the timing of generic entry and the frequency of a new indication, we created a variable (i.e., “Age”) for the NME’s ages during each observation year by calculating the number of years since their respective FDA approval dates. We also generated variables with Age raised to the power 2, 3, and 4 to allow for a non-linear relationship between age and the frequency of new indications. For NMEs with an equivalent generic drug approved during the observation period, we calculated the number of years to/since the first FDA approval of a generic equivalent. We then constructed dummy variables categorizing time to generic as being more than 10 years, 5–10 years away, 0–5 years away, or post-generic entry. Using these variables, each outcome (i.e., the number of new indications added) could be observed in years according to the new drug’s FDA approval date and in time period since/to the first generic approval. Since our objective was to focus upon new indications added during the post-approval period and to isolate the effect of generic entry, we excluded any new indications added at the time of FDA approval (i.e., AGE = 0) as well as drugs that had no generic equivalents by May 2020. Finally, for each year observed since NME approval and to/since generic entry, we observed whether there was either zero or at least one new indication. There was one observation for each year following the approval of each parent drug until 2020, for a total of 3154 observations.
We also categorized drugs by disease category in which the drug was first introduced, and calculated the proportion of drugs within each disease category. (We applied the MeSH disease categories from the National Library of Medicine.)
Analysis
We first report basic descriptive statistics of interest, including the proportion of drugs with one or more post-approval indications added during the observation period, the proportions of new indications added for more common versus rare diseases, and the proportion of new indication exclusivities that expired before versus after generic entry.
Given that generic entry occurred at different times for different drugs (Additional file 1: Fig. 1), the variation in a new drug’s period on the market (“age”) at generic entry can be used to disentangle the effects of age and generic entry on the probability of new indication development. We used the following logistic regression model to study the effect of the first generic entry timing on the possibility of having a second indication controlling for the drug’s age:
$$\ln \left( {\frac{p}{1 - p}} \right) = \alpha_{1} {\text{Age}} + \alpha_{2} {\text{Age}}^{2} + \alpha_{3} {\text{Age}}^{3} + \alpha_{4} {\text{Age}}^{4} + \beta_{1} G_{{{\text{neg}}}} + \beta_{2} G_{0,5} + \beta_{3} G_{5,10} + \varepsilon ,$$
where \(p\) is the probability of having a new indication; Age is observation year minus the new drug’s FDA approval date; and G is the years until generic entry (with \({G}_{\text{neg}}=1\) as observations following generic entry, \({G}_{\mathrm{0,5}}=1\) during the 5 years before generic entry, and \({G}_{\mathrm{5,10}}=1\) from 5 to 10 years before generic entry). The omitted category is all years more than 10 years before generic entry. Thus, the regression coefficients for the time to generic variables (G) in the model represent the predicted change in the logarithm of the odds ratio for each time-to-generic-entry category, compared to when generic entry is more than 10 years in the future. We use the polynomial values of Age up to \({\text{Age}}^{4}\) to ensure that we capture all the variation caused by years since the drug’s first approval.
Finally, while controlling for age, we estimated the counterfactual number of new indications, assuming the influence of generic entry were the same as when generic entry had already occurred, as when it is 0–5 years in the future, and as when it is 5–10 years into the future. To calculate the counterfactual number of second indications, we used the estimated results and replaced \({\beta }_{1}\) and \({\beta }_{2}\) with the estimated value of \({\beta }_{3}\) (i.e., generic entry is not to occur for another 5–10 years), adjusting appropriately for the frequency of observations with multiple new indications.