- Advertisement -
31.2 C
Nirmal
HomeNewsHealthJournal of Medical Web Analysis

Journal of Medical Web Analysis

- Advertisement -

Preeclampsia is a pregnancy-related hypertensive situation marked by the event of hypertension and protein within the urine after 20 weeks of gestation. Resulting from its a number of etiologies and sophisticated pathogenesis, it poses vital dangers to each maternal and perinatal well being []. This particular situation negatively impacts maternal well being and may result in severe problems for the fetus, together with placental abruption and restricted fetal progress. In keeping with international statistics, the incidence of preeclampsia ranges from 3% to 9%, with even increased charges noticed in sure high-risk populations []. Moreover, preeclampsia is among the main causes of maternal mortality worldwide, notably in low- and middle-income international locations. The prevalence of preeclampsia in China has elevated from 5.79% in 2005 to 9.5% in 2019 [], additional underscoring the pressing want for early screening and administration. To this point, the etiology and pathogenesis of preeclampsia stay incompletely understood, and efficient remedy measures are missing. Consequently, early detection and enhanced administration are important scientific methods.

Understanding the epidemiological traits of preeclampsia is important for creating efficient public well being methods. Within the research of preeclampsia, conventional statistical strategies primarily emphasize linear fashions and speculation testing, that are efficient in uncovering singular relationships between variables. Nonetheless, the pathological mechanisms underlying preeclampsia are extremely complicated, involving a number of interacting components, and conventional strategies could face limitations when addressing nonlinear and high-dimensional knowledge. In distinction, machine studying (ML) expertise has proven appreciable promise on this area.

A subset of synthetic intelligence (AI), ML is a expertise that permits computer systems to independently study from knowledge and make selections or predictions utilizing algorithms and fashions. Its utility in scientific settings can successfully stop and handle illnesses. At the moment, the utilization of ML to develop predictive fashions for preeclampsia is turning into more and more prevalent. As an illustration, Sylvain et al [] famous that the implementation of ML strategies has considerably improved the prediction accuracy of high-risk pregnancies, providing a novel perspective for the early identification of preeclampsia. Moreover, Ranjbar et al [] indicated that ML-based fashions surpass conventional regression fashions in predicting the incidence of preeclampsia. The multidimensional optimization capabilities of those fashions permit them to account for interactions amongst numerous scientific options and biomarkers, thereby enhancing diagnostic accuracy.

By leveraging ML, researchers can discover each linear and nonlinear relationships, in addition to uncover deep-seated options and patterns throughout the knowledge. This technique establishes a scientific basis for the immediate recognition and intervention of preeclampsia.

In contrast with prior systematic critiques and protocols on being pregnant outcomes or preeclampsia, the incremental contributions of this research are as follows: (1) we prespecified and applied subgroup analyses by consequence definition, gestational window, knowledge supply, and validation sort to keep away from indiscriminate pooling throughout extremely heterogeneous fashions and populations; (2) we handled space underneath the curve (AUC) as the first abstract measure and utilized strong univariate random-effects fashions (Hartung-Knapp-Sidik-Jonkman technique) to pool sensitivity and specificity individually, accompanied by 95% prediction intervals (PIs) to estimate future efficiency; and three) we clearly separated efficiency in inner vs exterior validation and documented whether or not decision-curve evaluation was performed. Taken collectively, these methodological enhancements intention to supply extra interpretable proof about the place deployment could also be acceptable and the place it stays untimely.

Analysis Design

This analysis was carried out in alignment with the PRISMA (Most popular Reporting Objects for Systematic Critiques and Meta-Analyses) 2020 requirements [] ( []). Particular particulars relating to the search key phrases may be present in Textbox S1 of the . Earlier than the research started, the protocol obtained approval and was registered with the PROSPERO underneath the reference quantity CRD420251005830.

Literature Search Technique

Complete searches had been executed in a number of prestigious databases, together with PubMed, Net of Science, IEEE Xplore, and the CNKI (China Nationwide Data Infrastructure). These searches targeted on finding scholarly papers that had been printed in both English or Chinese language. The timeframe for this search encompassed works printed till February 2025, guaranteeing that the latest and related literature was included within the investigation. The search technique was developed based mostly on the PICO (Inhabitants, Intervention, Comparability, and Consequence) framework. On this research, “P” denotes the inhabitants with PE, “I” refers to ML strategies because the intervention, “C” signifies the gold commonplace for comparability, and “O” encompasses outcomes, resembling sensitivity, specificity, and accuracy for prediction and prognosis (Desk S1 in ). Moreover, the reference lists from every recognized research underwent a handbook evaluation to uncover additional related analysis. Zotero (Middle for Historical past and New Media at George Mason College) was used to arrange the research and take away any duplicates.

The research’s inclusion standards had been formulated to ensure the rigor and relevance of the analysis. The standards encompassed (1) analysis papers printed in English or Chinese language; (2) investigations involving pregnant ladies from the overall inhabitants that explicitly outlined the prognosis of preeclampsia; (3) research that used ML fashions for predicting preeclampsia, together with an intensive clarification of those fashions; and (4) investigations that showcased the efficiency of the ML fashions, providing sufficient knowledge to find out each sensitivity and specificity. These standards aimed to strengthen the validity of the outcomes and guarantee an intensive evaluation of the prevailing literature.

The exclusion standards for this research are as follows: (1) research that solely investigated danger components with out creating a predictive mannequin; (2) papers printed in languages aside from English or of sorts aside from unique analysis, resembling studies and critiques; (3) duplicate publications; (4) research that included 2 or fewer predictors within the constructed mannequin; and (5) research for which the complete textual content was not accessible.

Literature Screening and Information Extraction

5 researchers (LL, QZ, YZ, XC, and WZ) meticulously adopted the established inclusion and exclusion standards to display screen the titles and abstracts of the literature. Research that met these standards superior to the full-text studying part, the place all related research had been reviewed. Every article underwent a minimal of two rounds of screening. Each the title and summary screening, in addition to the full-text studying, had been performed independently by the two researchers (LL and QZ). In situations of disagreement between them, one other researcher (JW) made the ultimate choice.

In complete, 26 research [-] had been chosen for evaluation. Information extraction was independently carried out by 2 researchers (LL and QZ) following the standardized protocol established by the TRIPOD (Clear Reporting of a Multivariable Prediction Mannequin for Particular person Prognosis or Analysis), as outlined within the current literature []. Information collected from every research included the next: (1) demographic particulars, such because the nation of information assortment, the research setting, the supply of the information, the design of the research, and the definition of outcomes; (2) strategies for knowledge partitioning, function choice algorithms, forms of ML prediction fashions, mannequin validation, and functions; (3) outcomes of predictions, which concerned accuracy, sensitivity, specificity, and the AUC; and (4) sources of funding and the approval of ethics. This research extracted sensitivity and specificity knowledge from every analysis report, all based mostly on the “optimum threshold” set within the respective unique research. This analysis didn’t standardize or regulate for the variations in thresholds among the many numerous research.

Bias and Applicability Evaluation

Overview

We used PROBAST (Prediction Mannequin Threat of Bias Evaluation Instrument) as the first instrument to protect comparability with prior preeclampsia meta-analyses (for detailed data, see ). As a result of many included research predate PROBAST-AI and lack AI-specific reporting (eg, leakage safeguards, hyperparameter tuning, calibration, and thresholds), a full PROBAST-AI evaluation could be dominated by underreporting fairly than demonstrated bias. The PROBAST [] was used to evaluate the chance of bias within the included research throughout 4 domains, particularly members, predictors, outcomes, and evaluation. Moreover, applicability assessments had been performed for the domains of inhabitants, predictors, and outcomes. Two researchers (LL and QZ) independently reviewed the research, present process consistency coaching based mostly on a preprepared and trialed scoring handbook. The discrepancies had been resolved by means of dialogue, and if needed, a 3rd researcher (JW) acted as an adjudicator.

Bias Evaluation

For all questions inside a class, if the solutions are “sure” or “probably,” the class is assessed as low danger. Conversely, if any reply is “no” or “probably not,” the class is classed as excessive danger. In circumstances the place there’s inadequate data, the class is deemed unclear. The general danger of bias within the research is set in accordance with the PROBAST tips: (1) if all 4 domains are assessed as low danger, the general danger of the research is low; (2) if a number of domains are assessed as excessive danger, the general danger of the research is excessive; and (3) if a number of domains are assessed as unclear (and there aren’t any high-risk domains), the general danger of the research is unclear.

Applicability Evaluation

The analysis encompasses 3 classes, together with research object, predictor, and consequence. Every class is assessed based mostly on 3 ranges of applicability, particularly good applicability, poor applicability, and unclear applicability. If all 3 assessments are labeled pretty much as good, the general applicability is set to be good. Conversely, if anyone evaluation is classed as poor, the general applicability is deemed poor. In circumstances the place one evaluation is unclear whereas the opposite two are good, the general applicability is classed as unclear.

Statistical Evaluation

The strategies described within the tips for conducting systematic critiques and meta-analyses regarding the efficiency of prediction fashions, together with earlier meta-analyses of such fashions, point out that the concordance index of a mannequin is just like the AUC []. This index signifies the diagnostic or prognostic discrimination capability, categorized as none (AUC≤0.6), poor (0.6

DOR=PLR/NLR

On this research, we use the constructive chance ratio (PLR) and the damaging chance ratio (NLR) to judge the predictive efficiency of our mannequin for preeclampsia. The equations used to calculate PLR and NLR categorical the frequency of preeclampsia in people who’re predicted by the mannequin to have preeclampsia in comparison with those that are predicted to not have preeclampsia:

PLR=Sensitivity/(1-Specificity)

NLR=(1-Sensitivity)/Specificity

Contemplating the variety in populations, predictors, and algorithms throughout the included ML fashions, our goal was to generalize findings to broader scientific contexts. Due to this fact, following the advice of Borenstein et al [], we a priori chosen the random-effects mannequin for all meta-analyses, regardless of the magnitude of statistical heterogeneity (I2). Particularly, we used the extra strong Hartung-Knapp-Sidik-Jonkman (HKSJ) technique for closing pooled estimates and interval calculations to make sure the robustness of statistical inferences []. The ML fashions included on this research exhibited substantial variations in pattern measurement and inhabitants traits, with the I2 statistic usually approaching 100% in bigger samples, doubtlessly limiting their capability to successfully distinguish the precise scientific influence of heterogeneity. Due to this fact, along with reporting the 95% CI for pooled impact sizes, this research additional calculated the 95% PI. In contrast to CIs, which solely mirror the precision of the common impact, PIs estimate the anticipated vary of efficiency when the mannequin is utilized in a brand new, related scientific setting sooner or later. This method offers a extra intuitive evaluation of the mannequin’s scientific applicability and transferability []. For the reason that Meta-DiSC software program (The developer is the scientific biostatistics workforce at Ramón y Cajal Hospital) can’t calculate PIs, we used the meta package deal (model 7.0) [] in R software program (R Basis for Statistical Computing; model 4.4.2) with the HKSJ technique to compute 95% PIs for space underneath the receiver working attribute curve (AUROC), sensitivity, and specificity. For AUROC values with out reported SEs, we estimated them based mostly on pattern measurement utilizing the Hanley & McNeil [] technique. Exterior validation is considered the “gold commonplace” for assessing the transportability of fashions. Due to this fact, a separate analysis of the efficiency of fashions that use exterior validation is performed. Subsequently, the 4 predictive fashions with the best and lowest values had been excluded to conduct a sensitivity evaluation aimed toward evaluating the influence of outliers on the sensitivity and specificity of the abstract. To scale back conceptual heterogeneity and improve the interpretability of outcomes, stratification is carried out alongside the next dimensions: pattern measurement (lower than 2000 and higher than or equal to 2000); knowledge supply (digital medical data; laboratory biomarkers; omics or imaging; combined); gestational age window (early being pregnant; midpregnancy and late being pregnant or particular gestational weeks); and validation strategies (inner validation and exterior validation); ML fashions (logistic regression [LR] and nonlogistic regression), adopted by extra detailed subgroup evaluation (LR, excessive gradient boosting [XGBoost], random forest [RF], and assist vector machine [SVM]) based mostly on nonlogistic regression; forms of predictive variables (demographic data; organic genetic markers; laboratory checks; demographic data and laboratory checks); and the variety of predictive variables (lower than 10 and higher than or equal to 10). Dealing with of lacking knowledge (extraction and synthesis). For every research, we recorded how lacking knowledge had been dealt with and labeled strategies into 5 classes, particularly listwise deletion, single-value imputation (eg, imply and median), a number of imputation, different (eg, random subset iterations), and never reported. When a number of approaches had been talked about, we coded the tactic used for the first mannequin. We summarize the general distribution within the outcomes of “Inclusion of Examine Traits within the Paper” and focus on implications for comparability and generalizability. Subgroup analyses shall be performed on the included research to judge the efficiency of ML strategies in predicting preeclampsia throughout totally different scientific situations. Subgroup Evaluation discusses the capabilities of various ML algorithms in predicting preeclampsia. Moreover, meta-regression was used to analyze the sources of heterogeneity. Given the intense heterogeneity (I2>99%) noticed throughout research and the dearth of standardized threshold reporting (eg, fastened false-positive charges), hierarchical or bivariate fashions usually fail to converge or yield unstable estimates. Due to this fact, we prioritized univariate random-effects fashions utilizing the HKSJ adjustment for pooling sensitivity and specificity individually. This technique is demonstrated to supply extra strong protection possibilities for CIs within the presence of considerable heterogeneity in comparison with commonplace DerSimonian-Laird [] strategies.

Literature Screening

After eradicating duplicate entries, a complete of 284 papers had been evaluated. Of those, 284 papers had been evaluated by means of summary screening, which was subsequently adopted by a full-text analysis of 88 papers. This course of culminated within the identification of 26 papers [-] that glad the general inclusion standards. The literature screening process and its outcomes are depicted within the associated .

Determine 1. PRISMA (Most popular Reporting Objects for Systematic Critiques and Meta-Analyses) stream diagram for research choice. CNKI: China Nationwide Data Infrastructure; PE: preeclampsia.

Inclusion of Examine Traits within the Paper

The literature included on this research spans from 2019 to 2025 and consists of 23 English papers [-,-,-,,] and three Chinese language papers [,,]. When a research introduced greater than 2 fashions, the highest 2 fashions demonstrating the most effective efficiency had been chosen based mostly on a complete analysis of metrics, resembling AUC, sensitivity, and specificity, culminating within the inclusion of 31 fashions from 26 papers [-]. The information sources for ML predominantly consisted of scientific digital well being data, group analysis cohorts, and self-administered questionnaires. The general pattern sizes within the research examined confirmed appreciable variation, fluctuating between 53 and 62,562 circumstances, whereas the depend of predictors within the final fashions ranged from 3 to 50. Amongst all of the research, 20 [,,,,-,-] performed inner validation, whereas 6 [,,,,,] carried out exterior validation. To evaluate mannequin efficiency, the AUC, sensitivity, and specificity emerged as probably the most ceaselessly used metrics. Among the many 26 research [-] reviewed, 5 (19.2%) research [,,,,] had been potential cohort research, 17 (65.4%) research [,,-,,,-,-,,] had been retrospective cohort research, 2 (7.7%) research [,] had been case-control research, 1 (3.8%) research [] was a retrospective case-control research, and 1 (3.8%) research []was a multicenter research. Relating to mannequin approaches, of the 31 fashions included, 3 had been LR. Among the many remaining 28 fashions, there have been 5 RF, 4 XGBoost, 4 Elastic-net, 3 neural community (NN), 3 SVM, 2 gentle gradient boosting, 2 AdaBoost, 1 k-nearest neighbor, 1 Naive Bayes, 1 stochastic gradient boosting, 1 CatBoost, and 1 voting classifier. When it comes to dealing with lacking knowledge, 8 research [,,,-,] opted to delete circumstances with lacking knowledge, 7 research [,,-,,] used imply imputation to handle the lacking values, 3 research [,,] used a number of imputation strategies, 1 research [] applied random choice of knowledge subsets for a number of iterative analyses, whereas the remaining 7 research [,,,,,,] didn’t explicitly report the presence of lacking values. Such variation limits comparability and exterior transportability of efficiency metrics and will increase uncertainty round calibration and threshold switch. The precise particulars of the fashions are introduced in .

Desk 1. Development of the chance prediction mannequin for preeclampsia.
Literature and modeling technique Mannequin efficiency Pattern measurement (modeling; inner validation; exterior validation) Lacking knowledge Predictors
AUCa Sensitivity Specificity Amount
(PCSb)
Dealing with technique
Ansbacher et al[]
FfNNc 0.816 0.533 0.9 30437/10000/20352 d 10 predictors: maternal age, maternal weight, maternal peak, interpregnancy interval, ethnicity, medical historical past (resembling continual hypertension, diabetes, and so on), uterine artery pulsatility index, imply arterial stress, placental progress issue, and pregnancy-associated plasma protein-A.
Araújo et al[]
LGBe 0.9 0.95 0.79 132/—/— Imply imputation 3 predictors: neutrophil depend, imply corpuscular hemoglobin, and mixture index of systemic irritation.
Chen et al[]
SVMf 0.88 0.87 0.76 166/—/— 7 predictors: IL-17, IL-21, IL-22, IL-10, remodeling progress factor-β, placental alkaline phosphatase, and lysosome-associated membrane protein 3.
Chen et al[]
CBg 0.983 0.8881 0.9848 1325/398/— Delete 18 predictors: BMI, systolic blood stress, diastolic blood stress, variety of pregnancies, imply corpuscular hemoglobin focus, micro organism (urinalysis), glycocholic acid, high-density lipoprotein, potassium, sodium, phosphorus, uric acid, urine protein, creatinine, direct bilirubin, low-density lipoprotein, gestational age≥34 weeks, and household historical past of hypertension.
Giménez et al[] 597/—/— Imply imputation 6 predictors: gestational age, historical past of continual hypertension, Soluble FMS-like Tyrosine Kinase-1, placental progress issue, N-terminal pro-brain natriuretic peptide, and uric acid.
PTB-RFh 0.901 0.796 0.91
RFi 0.941 0.775 0.949
Jhee et al[]
SGBj 0.924 0.603 0.991 7704/3302/— 25 A number of Imputation 14 predictors: systolic blood stress, serum urea nitrogen, serum creatinine, platelet depend, serum potassium stage, white blood cell depend, serum calcium stage, and urinary protein.
Kaya et al[]
XGBoostok 0.767 0.6 0.833 53/20/— Imply imputation 8 predictors: maternal age, BMI, smoking standing, historical past of diabetes, historical past of gestational diabetes, imply arterial stress, and historical past of earlier preeclampsia.
Kovacheva et al[] 1125/—/— Imply imputation 7 predictors: maternal age, BMI, systolic blood stress, diastolic blood stress, uric acid, historical past of kidney illness, and SBP PRSm.
LRl 0.83 0.85 0.66
XGBoost 0.91 0.96 0.44
Li et al[]
XGBoost 0.955 0.789 0.93 3759/191/— Imply imputation 38 predictors: maternal age, BMI, imply blood stress, belly circumference, gravidity, parity, historical past of preeclampsia, historical past of earlier cesarean part, interpregnancy interval, primipara, a number of gestation, assisted reproductive expertise, coronary heart illness, pregestational diabetes, thyroid illness, kidney illness, autoimmune illness, psychological sickness, uterine fibroids, adenomyosis, uterine malformation, historical past of epilepsy, household historical past of hypertension, hemoglobin, white blood cell depend, platelet depend, creatinine, fasting blood glucose, complete ldl cholesterol, high-density lipoprotein, low-density lipoprotein, complete protein, albumin, bile acids, uric acid, complete bilirubin, direct bilirubin, and gamma-glutamyl transferase.
Li et al[]
VCn 0.831 0.77 0.769 3715/929/— A number of Imputation 16 predictors: maternal age, peak, prepregnancy weight, primiparity, mode of conception, household historical past, smoking standing, historical past of preeclampsia, historical past of continual hypertension, historical past of continual kidney illness, historical past of diabetes, historical past of systemic lupus erythematosus/antiphospholipid syndrome, imply arterial stress, uterine artery pulsatility index, pregnancy-associated placental protein a, and placental progress issue.
Lv et al[]
XGBoost 0.963 0.917 0.894 832/208/— Delete 6 predictors: prepregnancy BMI, gravidity, imply arterial stress, smoking, alpha-fetoprotein, and conception technique.
Marić et al[]
ENo 0.79 0.452 0.919 5245/—/— Imply imputation 55 predictors: maternal age, peak, weight, ethnicity, variety of fetuses, imply systolic blood stress, imply diastolic blood stress, most systolic blood stress, most diastolic blood stress, historical past of preeclampsia, continual hypertension, sort 1 and kind 1 diabetes, gestational diabetes, weight problems, assisted reproductive expertise, prognosis of autoimmune illnesses, kidney illness, anemia, antiphospholipid syndrome, sexually transmitted illnesses, hyperemesis gravidarum, headache, migraine, poor obstetric historical past, high-risk being pregnant, protein and glucose in urine, platelet depend, pink blood cells, white blood cells, creatinine, hemoglobin, hematocrit, monocytes, lymphocytes, eosinophils, neutrophils, basophils, Rh blood sort, gastric acid, rubella, chickenpox, hepatitis B virus, syphilis, gonorrhea, aspirin, nifedipine, aldomet, labetalol, insulin, glyburide, prednisone, azathioprine, Plaquenil, heparin, levothyroxine, doxylamine, and acyclovir.
Melinte-Popescu et al[]
NBp 0.98 0.963 0.964 163/70/— 14 predictors: age, BMI, smoking standing, interpregnancy interval, use of assisted reproductive expertise, pregestational diabetes, continual hypertension, historical past of kidney illness, private or household historical past of preeclampsia, placental progress issue, pregnancy-associated plasma protein A, placental protein 13, uterine artery pulsatility index, and imply arterial stress.
Munchel et al[]
ABq 0.964 0.88 0.92 113/11/448 Randomly choose a subset of information for a number of iterative analyses. 49 predictors circulating transcripts in blood: immunomodulatory, fetal improvement, angiogenesis, and extracellular matrix transforming.
Roque et al[]
LR 0.976 0.9 0.951 35706/8927/— Delete 11 predictors: platelet depend, white blood cell depend, lymphocyte share, monocyte share, pink blood cell depend, pink cell distribution width, platelet distribution width, band neutrophil share, pink cell distribution width, hematocrit, and maternal age.
Sandströmet al[]
LR 0.67 0.282 0.9 62562/6256/— Imply imputation 36 predictors: gestational age at first go to, maternal age, BMI, imply arterial stress, capillary blood glucose stage, urine protein, hemoglobin stage, historical past of miscarriage, historical past of ectopic being pregnant, historical past of infertility remedy, household standing, nation of start, smoking historical past, smoking standing at registration, use of snuff within the first trimester of being pregnant, use of snuff throughout being pregnant, alcohol consumption within the 3 months earlier than registration, alcohol consumption habits on the time of being pregnant registration, household historical past of preeclampsia, infertility, household historical past of hypertension, earlier diabetes, continual hypertension, continual kidney illness, heart problems, endocrine illness, historical past of thrombosis, historical past of psychological sickness, historical past of epilepsy, Crohn/ulcerative colitis, lung illness or bronchial asthma, hepatitis, gynecological illness or surgical procedure, recurrent urinary tract infections, and blood sort.
Sufriyana et al[]
RF 0.86 0.7 0.89 23201/20975/GEVr:1322, TEVs: 90 301 Delete 13 predictors: age, household function, parity, sort of labor, infectious illnesses, endocrine, dietary and metabolic illnesses, circulatory system illnesses, immune-related illnesses, ophthalmic illnesses, urogenital illnesses, pores and skin and subcutaneous tissue–associated illnesses, breast-related illnesses, digestive system–associated illnesses, and skin-related illnesses.
Tiruneh et al[]
RF 0.84 0.76 0.79 33767/14475/— 66 Delete 13 predictors: maternal age, ethnicity, prepregnancy/early being pregnant BMI, historical past of preeclampsia in earlier pregnancies, primiparity, historical past of gestational diabetes, pre-existing hypertension, diabetes, household historical past of hypertension and diabetes, household historical past of preeclampsia, renal illness, smoking historical past, and polycystic ovary syndrome.
Torres et al[] 1068/914/— 78 Delete 13 predictors: placental progress issue, imply arterial stress, uterine artery pulsatility index, BMI, antiphospholipid syndrome, earlier preeclampsia, earlier diabetes, smoking standing, pure conception, Different drug use (resembling cocaine and heroin), systemic lupus erythematosus, continual hypertension, and maternal age.
all-EN 0.778 0.501 0.9
EPE-ENt 0.963 0.882 0.9
PPE-ENu 0.897 0.765 0.9
Wang et al[]
KNNv 0.9 0.7142 0.926 516/172/— Delete 7 predictors: urine protein, urine conductivity, alkaline phosphatase, serum uric acid, lactate dehydrogenase, imply corpuscular hemoglobin focus, and amylase.
Wang et al[]
AB 0.8775 0.7271 0.9 25709/77713/1760 20 predictors: maternal age, maternal BMI, regularity of maternal menstrual cycle, vomiting and nausea throughout being pregnant, earlier miscarriages, preterm births, historical past of hypertension throughout being pregnant, hypertension, diabetes, continual hypertension, historical past of drug allergic reactions, maternal smoking historical past, earlier supply historical past, dietary standing throughout being pregnant, maternal ethnic background, historical past of hypertension, historical past of diabetes, glycated hemoglobin, and albumin.
Xue et al[]
SVM 0.93 0.67 0.999 800/160/— Delete 50 predictors: diabetes mellitus, thrombotic illnesses, systemic lupus erythematosus, antiphospholipid syndrome, renal illnesses, assisted reproductive expertise, obstructive sleep apnea syndrome, prepregnancy BMI>30 kg/m², age>35 years, a number of being pregnant, primipara, historical past of eclampsia or preeclampsia, Albumin, Alanine aminotransferase, Aspartate aminotransferase, Alkaline phosphatase, Complement C1q, Calcium, Creatinine, C-reactive protein, Cystatin C, Gamma-glutamyl transferase, Globulin, Triglycerides, Complete ldl cholesterol, Excessive-density lipoprotein ldl cholesterol, Low-density lipoprotein ldl cholesterol, Lipoprotein(a), Apolipoprotein A1, Apolipoprotein B, Small dense low-density lipoprotein, Complete protein, Complete bile acid, Complete bilirubin, Direct bilirubin, Uric acid, Urea, Phosphorus, Absolute Lymphocyte depend, Absolute neutrophil depend, Platelet depend, NEU/LYM ratio, PLT/LYM ratio, Prothrombin time, Prothrombin exercise, Activated partial thromboplastin time, Fibrinogen, D-Dimer, Fibrin degradation merchandise, Thrombin time.
Yu et al[]
RF 0.96 0.87 0.91 404/1384/899 12 predictors: maternal age, BMI, parity, medical historical past (continual hypertension, preeclampsia, systemic lupus erythematosus, antiphospholipid syndrome), mode of conception; cfDNA profile indicators: Fos-related antigen 2 (FOSL2), calcium/calmodulin-dependent protein kinase kinase 2 (CAMKK2), G1/S-specific cyclin-D1 (CCND1), Inositol 1,4,5-trisphosphate receptor sort 1 (ITPR1), Protein kinase A catalytic subunit beta (PRKACB), Protein Wnt-7b (WNT7B), Voltage-dependent L-type calcium channel subunit beta-2(CACNB2), Nuclear respiratory issue 1 (NRF1), Fms-related tyrosine kinase 3 ligand (FLT3LG), Epidermal progress issue (EGF).
Zheng et al[]
LGB 0.964 0.849 0.927 1609/483/— A number of imputation 12 predictors: urine particular gravity, uric acid, imply corpuscular hemoglobin focus, globulin, platelet distribution width, potassium ion, age, household historical past of hypertension, systolic blood stress, diastolic blood stress, pulse, and gestational age≥34 weeks.
Zhou et al[] 432/197/288 19 predictors: mRNA markers: Albumin, Fibrinogen Alpha Chain, Leptin, Insulin-Like Development Issue Binding Protein 5, Alpha-1 Antitrypsin, S100 Calcium Binding Protein A9, Apolipoprotein A1, Thyroid Stimulating Hormone Beta Subunit, miRNA markers: MIR130A, MIR144, MIR19B1, MIR215, MIR376C, MIR27A, MIR106A, MIR33A, Inc ENA markers: Macrophage Migration Inhibitory Issue, Assisted Reproductive Expertise, Imply Arterial Stress.
AvNNw 0.91 0.63 0.93
SVM 0.93 0.47 0.99
Zhou et al[]
CNNx 0.883 0.722 0.934 1138/—/— 8 predictors: Retinal fundus picture rating, Prepregnancy BMI, maternal age, continual hypertension, diabetes, historical past of gestational hypertension or preeclampsia, assisted reproductive expertise, and autoimmune illnesses.

aAUC: space underneath the curve.

bPCS: items.

cFfNN: feed-forward neural community.

dnot reported.

eLGB: gentle gradient boosting.

fSVM: assist vector machine.

gCB: CatBoost.

hPTB-RF: Untimely start – Random Forest.

iRF: random forest.

fKNN: k-nearest neighbor.

jSGB: stochastic gradient boosting.

okXGBoost: excessive gradient boosting.

lLR: logistic regression.

mSBP PRS: systolic blood stress polygenic danger rating.

nVC: Voting Classifier.

oEN: Elastic-net.

pNB: Naive Bayes.

qAB: AdaBoost.

rGEV: geographic exterior validation

sTEV: temporal exterior validation

tEPE-EN: early onset of preeclampsia Elastic-net.

uPPE-EN: Untimely start of preeclampsia Elastic-net.

vKNN: k-nearest neighbor.

wAvNN: Common Neural Community.

xCNN: Convolutional Neural Networks.

Analysis High quality

We evaluated the potential for bias and the relevance of the prediction fashions based mostly on the PROBAST guidelines, analyzing a complete of 26 [-] research. Amongst these, 3 (12%) research [,,] within the participant area exhibited unclear danger of bias, primarily because of their case-control design, which is inherently related to a better danger of choice bias. Within the predictor area, 1 (4%) research [] was recognized as having unclear danger of bias as a result of it used C-RNA transcriptome assays that rely on transcriptome enrichment and high-throughput sequencing, strategies that aren’t sometimes utilized in routine scientific testing. Within the evaluation of bias domains, 8 (31%) research [,,,,,,,] demonstrated unclear danger of bias, primarily because of inadequate pattern sizes, unclear methodologies for addressing lacking knowledge, and uncertainties relating to the administration of overfitting dangers. Moreover, 1 (4%) research [] was labeled with a excessive danger of bias as all knowledge had been sourced from a single hospital, regardless of the amount of information, failing to symbolize a multicenter or stratified evaluation. General, the bias danger was decided to be unclear for 9 (35%) research [-,,,,,,]. The applicability scores had been average for 4 (15%) research [,,,], excessive for 1(4%) research [], and low for the remaining research [,,-,-], as detailed in . For the remaining particulars, see Desk S2 within the . 

Desk 2. Threat of bias and applicability evaluation utilizing PROBAST (Prediction Mannequin Threat of Bias Evaluation Instrument).
Examine and yr ROBa General bias ranking General applicability ranking Exterior validation
Contributors Predictors Consequence Evaluation
Ansbacher et al [], 2022 Low Low Low Low Low Low Sure
Araújo et al [], 2024 Unclear Low Low Unclear Unclear Low No
Chen et al [], 2022 Low Low Low Unclear Unclear Unclear No
Chen et al [], 2023 Unclear Low Low Low Unclear Unclear No
Garrido-Giménez et al [], 2023 Low Low Low Low Low Low No
Jhee et al [], 2019 Low Low Low Low Low Low No
Kaya et al [], 2024 Low Low Low Unclear Unclear Low No
Kovacheva et al [], 2023 Low Low Low Low Low Low No
Li et al [], 2021 Unclear Low Low Unclear Unclear Low No
Li et al [], 2024 Low Low Low Low Low Low No
Lv et al [], 2025 Low Low Low Low Low Low No
Marić et al [], 2020 Low Low Low Low Low Low No
Melinte-Popescu et al [], 2023 Low Low Low Low Low Low No
Munchel et al [], 2020 Low Unclear Low Unclear Unclear Unclear Sure
Roque et al [], 2024 Low Low Low Excessive Low Excessive No
Sandström et al [], 2019 Low Low Low Low Low Low No
Sufriyana et al [], 2020 Low Low Low Low Low Low Sure
Tiruneh et al [], 2024 Low Low Low Low Low Low No
Torres et al [], 2024 Low Low Low Low Low Low No
Wang et al [], 2022 Low Low Low Low Low Low No
Wang et al [], 2024 Low Low Low Low Low Low Sure
Xue et al [], 2023 Low Low Low Unclear Unclear Low No
Yu et al [], 2024 Low Low Low Low Low Low Sure
Zheng et al [], 2021 Low Low Low Low Low Low No
Zhou et al [], 2024 Low Low Low Unclear Unclear Low Sure
Zhou et al [], 2023 Low Low Low Unclear Unclear Unclear No

aROB: danger of bias.

The Efficiency of ML Fashions in Preeclampsia Prediction

A complete of 26 (31 fashions) research [-] had been included. Whereas the pooled estimates demonstrated excessive common discriminative potential of ML fashions, substantial between-study heterogeneity was noticed, indicating vital context-dependency of mannequin efficiency. The general pooled AUROC was 0.91 (95% CI 0.87-0.92; ). Nonetheless, its 95% PI ranged from 0.75 to 1.00, suggesting that AUC would possibly lower to 0.75 in some exterior validation settings. The pooled sensitivity was 0.81 (95% CI 0.70-0.83; P<.001; I2=99.6%) Within the [-], the primary creator of every research is listed alongside the Y-axis, the circles symbolize the purpose estimates of sensitivity for every mannequin, with the scale of the circles being proportional to the load of the research; the horizontal traces point out their 95% CIs. The letter Q represents the intersection level of the SROC curve with the inverse diagonal line the place “Sensitivity = Specificity.” The diamonds symbolize the aggregated sensitivity estimates of the fashions, with their width akin to the 95% CI of the aggregated values. The vertical pink dashed line represents the 95% CI of the pooled sensitivity. Nonetheless, this solely represents a median stage; the vast 95% PI of 0.32-0.96] reveals potential scientific dangers. In sure particular research or future functions, the sensitivity could also be as little as 32%, indicating a considerable danger of missed diagnoses. Equally, though the pooled specificity was 0.88 (95% CI 0.84-0.94; P<.001; I2=99.7%; [-]), its PI throughout totally different contexts was 0.49-0.99, demonstrating the same lack of consistency in specificity. The opposite abstract metrics had been as follows: DOR was 37.67 (95% CI 23.46-60.48); PLR was 8.52 (95% CI 6.43-11.29); NLR was 0.24 (95% CI 0.18-0.34). Moreover, we calculated the Spearman correlation coefficient between the log of sensitivity and the log of (1-specificity), which yielded a results of 0.254 (P=.17), indicating no vital threshold impact within the included research. This means that the noticed excessive heterogeneity (in addition to the broad PIs talked about above) primarily stems from nonthreshold components (resembling variations in predictor choice or inhabitants traits), fairly than merely from variations in cutoff worth choice.

Determine 2. Abstract Receiver Working Attribute (SROC) plot illustrating the dispersion of research outcomes. AUC: space underneath the curve; SROC: Abstract Receiver Working Attribute.
Determine 3. General sensitivity of machine studying fashions for the prediction of preeclampsia [-, -].
Determine 4. General abstract specificity of machine studying for the prediction of preeclampsia. [-, -].

Efficiency Evaluation of Exterior Validation Fashions

A complete of 6 (comprising 7 fashions) research [,,,,,] underwent exterior validation. The evaluation revealed that when utilized to impartial exterior populations, the fashions exhibited efficiency decline with persistent excessive heterogeneity. Particularly, the pooled AUC was 0.91 (95% CI 0.85-0.95; ). Nonetheless, its 95% PI was 0.76-1.00, indicating that the mannequin’s discriminative capability is perhaps suboptimal in sure exterior settings. The pooled sensitivity considerably decreased to 0.68 (95% CI 0.54-0.83; P<.001; I2=99.6%; [,,,,,]), with a 95% PI of 0.25-0.94. The decrease restrict of 0.25 signifies that within the worst-case exterior validation situation, the mannequin could miss 75% (23/31) of sufferers, posing an especially excessive danger of missed prognosis. The pooled specificity was 0.90 (95% CI 0.86-0.96; P<.001; I2=99.7%; [,,,,,]), with a 95% PI of 0.62-0.99. Different indicators included: DOR of 28.21 (95% CI 18.10-43.98; I2=97.6%); PLR of seven.51; NLR of 0.32. The lower in sensitivity (from 0.81 within the main evaluation to 0.68) and the extraordinarily low restrict of the PI (0.25) strongly confirmed the restricted transportability of the mannequin throughout populations, indicating that direct scientific utility requires excessive warning.

Determine 5. Abstract Receiver Working Attribute (SROC) plot for exterior validation fashions. AUC: space underneath the curve; SROC: Abstract Receiver Working Attribute.
Determine 6. Abstract sensitivity of machine studying fashions for predicting preeclampsia based mostly on exterior validation [,,,,,].
Determine 7. General abstract specificity of machine studying for predicting preeclampsia [,,,,,].

Sensitivity Evaluation

After conducting a sensitivity evaluation excluding case-control research in a leave-one-domain-out with 4 (15%) fashions, the general abstract AUROC is 0.9109 (95% CI 0.8642-0.9390). The abstract sensitivity estimate derived from the random-effects meta-analysis is 0.81 (95% CI 0.70-0.83; P<.001; I2=99.7%), and the abstract specificity is 0.88 (95% CI 0.84-0.94; P<.001; I2=99.7%), as detailed in [-]. Consequently, it was concluded that the pooled estimates remained unaffected by the exclusion of outlier values. With an AUC>0.8, the mannequin demonstrated good discriminative capability, however an I2>75% indicated substantial heterogeneity inside most subgroups. To deal with this difficulty and acquire deeper insights, we undertook a subgroup evaluation to analyze the potential sources of this heterogeneity throughout the research that had been included in our evaluation. Accordingly, we don’t interpret a single pooled estimate as “common scientific efficiency” and as a substitute prioritize subgroup outcomes. As well as, to remove the influence of a number of fashions (derived from the identical inhabitants) inside a single research on statistical independence (unit-of-analysis error), we performed further sensitivity analyses by retaining solely the mannequin with the best AUROC from every research (N=26). The outcomes confirmed that the pooled sensitivity after deduplication was 0.81 (95% CI 0.73-0.87), specificity was 0.88 (95% CI 0.83-0.91), and AUROC was 0.90 (95% CI 0.87-0.93). The above outcomes had been extremely in line with the first evaluation (N=31), with no vital variations noticed within the CIs, indicating that incorporating totally different fashions from the identical research didn’t result in inflated outcomes or underestimated variance. Due to this fact, we retained all fashions within the main evaluation to reveal the efficiency variations amongst numerous predictor combos.

Determine 8. Forest plots of diagnostic efficiency [-, -].

Subgroup Evaluation

The comparative outcomes of the subgroup evaluation on preeclampsia prediction efficiency are introduced in ; forms of ML fashions, forest plots are proven in Figures S1-S22 in . The comparability between subgroups was decided by analyzing whether or not the 95% CI of the AUC overlapped. Nonoverlapping intervals indicated statistical significance whereas overlapping intervals indicated no statistical significance. Information had been derived from digital well being data, high-throughput omics, and hybrid sources. Subgroup evaluation indicated that fashions based mostly on hybrid knowledge demonstrated superior efficiency, adopted by these utilizing digital well being data and high-throughput omics. Nonetheless, appreciable heterogeneity was noticed, and the 95% CIs extensively overlapped throughout the three knowledge sorts, suggesting no statistically vital variations amongst them. The “being pregnant window” refers back to the index timing window throughout which predictors had been collected or mannequin discrimination was carried out. Fashions constructed utilizing third-trimester knowledge confirmed higher efficiency with low heterogeneity. Nonetheless, overlapping 95% CIs throughout fashions indicated no statistically vital variations amongst being pregnant window subgroups. Relating to validation methods, internally validated fashions outperformed externally validated ones, albeit with excessive heterogeneity. Subgroup evaluation revealed overlapping 95% CIs between the two validation sorts, implying that the distinction was not statistically vital. Relating to pattern measurement, the subgroup evaluation outcomes confirmed that fashions with smaller pattern sizes outperformed these with bigger pattern sizes, exhibiting decrease heterogeneity. Nonetheless, because the 95% CI overlapped, the variations between pattern measurement subgroups weren’t statistically vital. Relating to the adopted mannequin, nonlogistic regression prediction fashions outperformed logistic regression prediction fashions. Additional evaluation was performed on nonlogistic regression fashions with 3 or extra situations in every mannequin class, revealing that neural networks exhibited the most effective predictive efficiency with an AUC of 0.9966 (95% CI 0.9772-1.0000) and the bottom heterogeneity. The distinction in mannequin efficiency was statistically vital when in comparison with elastic web fashions, however not statistically vital when in comparison with different fashions. Relating to the kind of predictive variables, prediction fashions constructed solely utilizing laboratory take a look at indicators achieved the best predictive efficiency with an AUC of 0.9463 (95% CI 0.9097-0.9820) and the bottom heterogeneity. However, when in comparison with fashions constructed with different indicators, the distinction in efficiency was not statistically vital. For the variety of predictor variables utilized in mannequin constructing, fashions with 10 or extra variables exhibited increased predictive efficiency with an AUC of 0.9204 (95% CI 0.8671-0.9737), however the distinction was not statistically vital in comparison with fashions with fewer than 10 variables.

Desk 3. Subgroup evaluation outcomes.
Grouping Variety of prediction fashions (PCSa) AUCb (95% CI) I2 (%) P worth
Complete research 31 0.9168 (0.891-0.950) 99.6 <.001
Pattern measurement
<2000 16 0.9361 (0.9079-0.9643) 90.9 <.001
≥2000 15 0.9109 (0.8501-0.9717) 99.8 <.001
Information supply
Combined 14 0.9154 (0.8713-0.9595) 99.6 <.001
EHRc 12 0.9126 (0.8430-0.982) 99.4 <.001
Omics 4 0.9406 (0.8898-0.9914) 95.3 <.001
Being pregnant window
Early 10 0.9406 (0.7853-1.0000) 95.2 <.001
Mid 4 0.9304 (0.8965-0.9643) 77.2 .004
Late 3 0.9665 (0.9314-1.0000) 71.4 .03
Particular 14 0.9138 (0.8805-0.9471) 99.1 <.001
Machine studying mannequin
Logistic regression 3 0.9044 (0.6857-1.0000) 100.0 <.001
Nonlogistic regression 28 0.9171 (0.8871-0.9471) 97.6 <.001
RFd 5 0.8917 (0.7950-0.9884) 95.8 <.001
SVMe 3 0.9068 (0.7623-1.0000) 88.2 <.001
XGBoostf 4 0.9177 (0.8500-0.9854) 89.9 <.001
ENg 4 0.9419 (0.9125-0.9713) 93.9 <.001
NNh 3 0.9966 (0.9772-1.0000) 84.7 .001
Predictor variable sort
Demographic data 10 0.8754 (0.8315-0.9193) 99.4 <.001
Organic genetic marker 3 0.9300 (0.8375-1.0000) 96.8 <.001
Demographic data and laboratory checks 13 0.9275 (0.8665-0.9885) 98.4 <.001
Laboratory testing 5 0.9463 (0.9097-0.9820) 95.8 <.001
Variety of predictor variables
<10 10 0.9124 (0.8855-0.9393) 86.6 <.001
≥10 21 0.9196 (0.8665-0.9727) 99.8 <.001

aPCS: piece.

bAUC: space underneath the curve.

cEHR: digital well being file.

dRF: random forest.

eSVM: assist vector machine.

fXGBoost: excessive gradient boosting.

gEN: elastic community.

hNN: neural community.

Meta-Regression Evaluation

As a result of vital heterogeneity noticed among the many research, a meta-regression evaluation was performed. The meta-analysis targeted on numerous components, together with pattern measurement, nation of publication, sort of ML mannequin, yr of publication, research design, research high quality, and predictors, as detailed in . Variables had been systematically eliminated based mostly on the magnitude of their P values, and separate meta-regression analyses had been carried out for every variable. The outcomes indicated that the supply of heterogeneity among the many research was primarily related to the analysis high quality, as illustrated in .

Desk 4. Meta-regression evaluation.
Variable β coefficient (SE) P worth RDORa(95% CI)
Fixed 3.547 (1.3356) .01 b
Pattern measurement 1.075 (0.5388) .06 0.34 (0.11-1.05)
Nation –0.322 (0.4741) .50 0.72 (0.27-1.94)
MLc technique 0.588 (0.7387) .43 1.80 (0.39-8.37)
Yr 0.007 (0.4578) .99 1.01 (0.39-2.61)
Design –1.435 (0.8047) .09 0.24 (0.04-1.27)
High quality 0.672 (0.4076) .11 1.96 (0.84-4.57)
Predictive 0.773 (0.6075) .22 2.17 (0.61-7.67)
Validation sort –0.318 (0.4797) .51 0.73 (0.27-1.97)

aRDOR: relative diagnostic odds ratio.

bNot relevant.

cML: machine studying.

Desk 5. Meta-regression evaluation after excluding P values from largest to smallest.
Variable β coefficient (SE) P worth RDORa (95% CI)
Fixed 2.398 (0.5879) <.001 b
High quality 0.800 (0.3951) .05 2.23 (0.99-5.00)

aRDOR: relative diagnostic odds ratio.

bNot relevant.

Principal Findings

This systematic evaluation recognized 31 ML fashions for preeclampsia prediction. Our main discovering highlights a crucial paradox. Whereas fashions reveal excessive common discriminative potential (pooled AUROC 0.91), they exhibit excessive heterogeneity (I2>99%) and restricted transportability. The vast 95% PI for sensitivity (0.32-0.96) warns {that a} mannequin performing completely in improvement could miss almost 70% of circumstances when utilized to a brand new inhabitants. This “context dependence” is additional confirmed by the efficiency drop in exterior validation research (pooled sensitivity of 0.68), suggesting that present excessive AUROCs largely mirror inner match fairly than common scientific effectiveness.

To research the sources contributing to this heterogeneity (in addition to the vast PIs), our subgroup evaluation revealed a number of key components. Within the subgroup evaluation of all 31 fashions, we noticed that their predictive efficiency was higher when the pattern measurement was small (lower than 2000 circumstances), which contradicts the standard understanding that “bigger pattern sizes result in higher predictive efficiency” []. The evaluation could also be considerably influenced by confounding components, resembling research design (eg, case-control research) and analysis sort—particularly contemplating the very excessive AUC of the elastic web (AUC=0.963 for Torres et al []; AUC=0.96 for Yu et al []). Due to this fact, cautious discernment is required, and one shouldn’t unexpectedly interpret this as indicating superior predictive efficiency of fashions with smaller pattern sizes. Relating to predictor sorts, laboratory take a look at indicators exhibit superior predictive efficiency, because the core pathological mechanisms of preeclampsia embody placental perfusion issues, endothelial dysfunction, oxidative stress, and inflammatory responses []. Laboratory indicators can straight mirror pathological states, whereas demographic data offers solely oblique danger assessments.

Among the many ML fashions analyzed on this research, together with RF, SVM, NN, and Elastic-net, the NN mannequin demonstrated the best predictive efficiency (AUC=0.99, 95% CI 0.98-1.00), surpassing conventional ML strategies, resembling LR, RF, and excessive gradient boosting. This evaluation could also be attributed to the complicated etiology of preeclampsia, a being pregnant complication characterised by a number of pathological processes. The intricate, multidimensional interactions inherent in preeclampsia are difficult to seize comprehensively utilizing linear fashions. In distinction, NN fashions are well-equipped to mannequin nonlinear relationships and higher-order variable interactions, which extra precisely mirror the pathological traits of preeclampsia []. In comparison with conventional strategies, NN can routinely extract options and assign weights to enter variables with out the necessity for in depth handbook variable screening, demonstrating explicit benefits in dealing with high-dimensional knowledge []. Furthermore, NN fashions can combine multisource heterogeneous knowledge, resembling demographic data, laboratory indicators, and organic genetic markers, thereby adapting to the more and more complicated tendencies in scientific knowledge.

Increased predictive efficiency is noticed when the variety of predictors is the same as or higher than 10. This means that utilizing a higher variety of predictors helps to extra comprehensively mirror illness standing, considerably enhancing the mannequin’s predictive efficiency. That is very true for nonlinear algorithms, that are higher outfitted to seize interplay results and underlying patterns.

Nonstandardized dealing with of lacking knowledge implies that AUC; concordance index and calibration is probably not straight comparable throughout research; specifically, listwise deletion or easy imputation mixed with restricted case-mix and threshold tuning can inflate discrimination and understate uncertainty. We subsequently suggest at minimal (1) clear reporting of missingness (general and by variable) and the first imputation technique; (2) preferential use of a number of imputation or model-based strategies, with minimal recalibration (slope and Brier) and decision-curve evaluation throughout exterior validation; and (3) reporting confusion matrices underneath fastened thresholds and top-N% triage plus subgroup robustness (GA window; consequence definitions and websites) to boost interpretability for scientific and digital well being use.

Strengths and Limitations

First, relating to methodological rigor and transparency, we strictly adhered to the PRISMA tips for reporting, and the analysis protocol has been preregistered within the worldwide potential systematic evaluation registry PROSPERO (CRD420251005830). This ensures that the analysis goals and strategies are predetermined, thereby minimizing reporting bias. Second, regarding the comprehensiveness of the literature search, our search technique reveals vital interdisciplinary traits. We not solely searched mainstream medical databases resembling PubMed and CNKI, but additionally included IEEE Xplore and Net of Science to make sure a complete seize of ML fashions printed within the fields of engineering expertise and pc science. That is crucial for a subject that bridges scientific medication and synthetic intelligence, avoiding potential omissions of fashions which may happen if solely medical databases had been searched. Third, relating to the reliability of information processing, all the strategy of literature screening and knowledge extraction on this research was performed independently by 2 researchers, with any discrepancies resolved by means of dialogue or by involving a 3rd researcher as an adjudicator. This “twin evaluation” course of is taken into account the gold commonplace for systematic critiques, guaranteeing the accuracy of information extraction. Fourth, when it comes to the professionalism of high quality evaluation, we used the PROBAST device, which is at present really useful by worldwide authorities and particularly designed for predictive mannequin analysis, fairly than conventional diagnostic take a look at analysis instruments, resembling QUADAS-2 (Whiting and colleagues []). PROBAST allows us to completely assess the chance of bias and applicability of the fashions throughout 4 key domains, together with members, predictive components, outcomes, and evaluation, which is extra in-depth and related than earlier critiques. Lastly, relating to the prudence of research, this research acknowledges the frequent pitfall of “efficiency overestimation” in meta-analyses of predictive fashions. Due to this fact, we clearly recognized fashions missing exterior validation and performed an impartial meta-analysis of research that reported exterior validation. This method allowed us to extra precisely assess the transportability of the fashions in real-world functions, resulting in the conclusion that they’re “extremely context-dependent,” which is a extra cautious and clinically sensible interpretation, avoiding overinterpretation of the aggregated AUROC.

Our research has a number of limitations that must be thought-about when deciphering the findings. First, and most critically, is the problem of threshold heterogeneity and optimistic bias. As detailed within the “Strategies” part, the efficiency metrics had been synthesized from study-specific “optimum thresholds.” This precluded using threshold-independent abstract measures from a bivariate mannequin and means our pooled sensitivity and specificity are doubtless inflated in comparison with what could be achieved with a prespecified, clinically related cutoff. The vast PIs we report are, partly, a quantification of this inflation danger. Future main research ought to report efficiency at a number of, clinically justified thresholds to facilitate extra significant meta-analysis. Second, associated to the above, our statistical synthesis method was necessitated by the information traits. The intense heterogeneity and lack of threshold standardization made the popular bivariate modeling method unfeasible. Whereas our use of univariate HKSJ fashions with PIs is a sturdy different that truthfully communicates uncertainty, it doesn’t mannequin the correlation between sensitivity and specificity. Our subgroup and meta-regression analyses assist discover sources of heterogeneity, however residual confounding is probably going. Third, our search, although complete, could have missed research in different languages or in nonindexed repositories. Moreover, we didn’t formally assess for publication bias utilizing funnel plots or statistical checks, as these strategies are much less established and interpretable for diagnostic accuracy knowledge with excessive heterogeneity. Due to this fact, our outcomes could also be influenced by the preferential publication of research with constructive or high-performance outcomes.

Medical Significance

The methodological decisions on this meta-analysis straight inform its central message. The choice to extract knowledge at study-specific “optimum thresholds” inherently captures the optimistic bias prevalent in ML mannequin improvement. The strikingly vast 95% PI for sensitivity (0.32-0.96), calculated from these doubtlessly inflated estimates, subsequently represents a conservative and sensible warning. The true efficiency in a brand new setting, after needed recalibration to a neighborhood threshold, might fall to clinically unacceptable ranges. This discovering powerfully reinforces the precept that exterior validation is just not a mere formality however a elementary requirement to bridge the hole between algorithmic promise and scientific utility.

Medical implementation of those fashions requires a shift from “common utility” to “native adaptation.” Given the vast PIs, hospitals shouldn’t undertake printed fashions straight. As an alternative, we suggest a workflow of native validation and recalibration. Future analysis ought to prioritize multicenter exterior validation over creating new fashions. The place knowledge sharing is restricted, federated studying provides a promising pathway to coach strong fashions throughout numerous populations with out compromising privateness.

Conclusions

In abstract, ML fashions reveal promising potential for predicting preeclampsia, fairly than serving as ready-made common options. Whereas pooled analyses point out excessive discriminative efficiency, the substantial heterogeneity (I²>99%) and vast 95% PIs (sensitivity 0.32-0.96) reveal vital instability in mannequin efficiency throughout totally different scientific contexts. This “context dependency” was additional corroborated in exterior validation analyses. When utilized to impartial populations, the mannequin not solely exhibited decreased mixture sensitivity but additionally the decrease sure of its PI dropped to 0.25, quantifying the substantial transplantation danger encountered in cross-center functions. Present proof subsequently helps contemplating ML as a possible screening adjunct, however doesn’t but justify its use as a common scientific diagnostic device. Future analysis ought to shift focus from solely pursuing new fashions with excessive AUC values to conducting rigorous multicenter exterior validation and recalibration of current fashions, with a view to set up their relevant boundaries inside real-world scientific pathways.

Through the preparation of this work, the authors used Gemini (Google) to help in refining the English language and construction of the manuscript, in addition to to generate R code for the statistical evaluation (particularly for the Hartung-Knapp-Sidik-Jonkman technique and prediction intervals). After utilizing this device, the authors reviewed and edited the content material as wanted and take full accountability for the content material of the publication.

This work was supported by the grants from Liaoning Provincial Science and Expertise Program Joint Initiative (Key Analysis and Improvement Program Challenge) and The overall challenge of the Division of Schooling in Liaoning Province (JYTMS20230103).

None declared.

Edited by J Sarvestan; submitted 08.Jun.2025; peer-reviewed by T Shi, P Wu; feedback to creator 01.Sep.2025; revised model obtained 20.Dec.2025; accepted 20.Dec.2025; printed 19.Jan.2026.

©Lu Liu, Qixuan Zhu, Yichi Zong, Xueyuan Chen, Wei Zhang, Jun Wang. Initially printed within the Journal of Medical Web Analysis (https://www.jmir.org), 19.Jan.2026.

That is an open-access article distributed underneath the phrases of the Inventive Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which allows unrestricted use, distribution, and copy in any medium, offered the unique work, first printed within the Journal of Medical Web Analysis (ISSN 1438-8871), is correctly cited. The whole bibliographic data, a hyperlink to the unique publication on https://www.jmir.org/, in addition to this copyright and license data should be included.

- Advertisement -
Admin
Adminhttps://nirmalnews.com
Nirmal News - Connecting You to the World
- Advertisement -
Stay Connected
16,985FansLike
36,582FollowersFollow
2,458FollowersFollow
61,453SubscribersSubscribe
Must Read
- Advertisement -
Related News
- Advertisement -

LEAVE A REPLY

Please enter your comment!
Please enter your name here