Nontuberculous mycobacterial lung disease (NTMLD) is a rare lung disease often missed due to a low index of suspicion and unspecific clinical presentation. This retrospective study was designed to characterise the pre-diagnosis features of NTMLD patients in primary care and to assess the feasibility of using machine learning (ML) to identify undiagnosed NTMLD patients.IQVIA Medical Research Data (IMRD; incorporating THIN, a Cegedim Database), a UK electronic medical records primary care database was used. NTMLD patients were identified between 2003 and 2017 by diagnosis in primary or secondary care or record of NTMLD treatment regimen. Risk factors and treatments were extracted in the pre-diagnosis period, guided by literature and expert clinical opinion. The control population was enriched to have at least one of these features.A total of 741 NTMLD and 112 784 control patients were selected. Annual prevalence rates of NTMLD from 2006 to 2016 increased from 2.7 to 5.1 per 100 000. The most common pre-existing diagnoses and treatments for NTMLD patients were chronic obstructive pulmonary disease, asthma, penicillin, macrolides and inhaled corticosteroids. Compared to random testing, ML improved detection of patients with NTMLD by almost a thousand-fold with AUC of 0.94. The total prevalence of diagnosed and undiagnosed cases of NTMLD in 2016 was estimated to range between 9 and 16 per 100 000.This study supports the feasibility of ML applied to primary care data to screen for undiagnosed NTMLD patients with results indicating that there may be a substantial number of undiagnosed cases of NTMLD in the UK.