Liver biopsies are a critical component of pivotal studies in nonalcoholic steatohepatitis (NASH) constituting main inclusion criteria, risk stratification factors and endpoints. We evaluated the reliability of NASH Clinical Research Network scoring of liver biopsies in a NASH clinical trial.Digitized slides from 678 biopsies for 339 patients with paired biopsies randomized into the EMMINENCE study examining a novel insulin sensitizer (MSDC-0602K) in NASH were read independently by three hepatopathologists blinded to treatment code and scored using the NASH CRN Histological Scoring System. Various endpoints were computed from these scores.Inter-reader linearly weighted kappas were 0.609, 0.484, 0.328, and 0.517 for steatosis, fibrosis, lobular inflammation, and ballooning, respectively. Inter-reader unweighted kappas were 0.400 for the diagnosis of NASH, 0.396 for NASH resolution without worsening fibrosis, and 0.366 for fibrosis improvement without worsening NASH. In the current study, 46.3% of the patients included in the study based on one hepatopathologist's qualifying reading were deemed by at least one of the three hepatopathologists as not meeting the study's histologic inclusion criteria. The MSDC-0602K treatment effect was lowest for those histologic features with lower inter-reader reliability. Simulations show that the lack of reliability of endpoints and inclusion criteria can drastically reduce study power - from > 90% in a well-powered study to as low as 40%.Reliability of hepatopathologists' liver biopsy evaluation using currently accepted criteria is suboptimal. This lack of reliability may affect NASH pivotal studies by introducing patients who do not meet NASH study entry criteria, misclassifying fibrosis subgroups, and attenuating apparent treatment effects.