2023
A hierarchical strategy to minimize privacy risk when linking “De-identified” data in biomedical research consortia
Ohno-Machado L, Jiang X, Kuo T, Tao S, Chen L, Ram P, Zhang G, Xu H. A hierarchical strategy to minimize privacy risk when linking “De-identified” data in biomedical research consortia. Journal Of Biomedical Informatics 2023, 139: 104322. PMID: 36806328, PMCID: PMC10975485, DOI: 10.1016/j.jbi.2023.104322.Peer-Reviewed Original ResearchConceptsPrivacy of individualsAppropriate privacy protectionData-driven modelsPrivacy protectionPrivacy risksData Coordination CenterData hubData repositoryHierarchical strategyPrivacyBiomedical discoveryData setsRecord linkageData Coordinating CenterRepositoryComplex strategiesCoordination centerTechnologyTechniqueDataPartiesSetHierarchy
2017
Finding useful data across multiple biomedical data repositories using DataMed
Ohno-Machado L, Sansone S, Alter G, Fore I, Grethe J, Xu H, Gonzalez-Beltran A, Rocca-Serra P, Gururaj A, Bell E, Soysal E, Zong N, Kim H. Finding useful data across multiple biomedical data repositories using DataMed. Nature Genetics 2017, 49: 816-819. PMID: 28546571, PMCID: PMC6460922, DOI: 10.1038/ng.3864.Peer-Reviewed Original ResearchConceptsBiomedical data repositoriesHealth big dataData setsKnowledge discoveryBig dataMultiple repositoriesSearch enginesData indexFAIR principlesDataMedData repositoryService providersKnowledge initiativesKnowledge expertsBiomedical research communityResearch communityRepositoryScience landscapeUseful dataInteroperabilityMetadataFindabilitySetEngineData
2015
VERTIcal Grid lOgistic regression (VERTIGO)
Li Y, Jiang X, Wang S, Xiong H, Ohno-Machado L. VERTIcal Grid lOgistic regression (VERTIGO). Journal Of The American Medical Informatics Association 2015, 23: 570-579. PMID: 26554428, PMCID: PMC4901373, DOI: 10.1093/jamia/ocv146.Peer-Reviewed Original ResearchConceptsFederated data analysisReal-world medical classification problemsMedical classification problemsLogistic regression algorithmAccurate global modelData setsReal data setsClassification problemExchange of informationLR problemTime complexityComputational complexityExpensive operationRegression algorithmComputational costData analysisAlgorithmDual optimizationTechnical challengesLarge amountComplexityPatient recordsLR modelNovel techniqueHessian matrixGrid multi-category response logistic models
Wu Y, Jiang X, Wang S, Jiang W, Li P, Ohno-Machado L. Grid multi-category response logistic models. BMC Medical Informatics And Decision Making 2015, 15: 10. PMID: 25886151, PMCID: PMC4342889, DOI: 10.1186/s12911-015-0133-y.Peer-Reviewed Original ResearchConceptsGrid modelLikelihood estimation problemClassification performance evaluationReal data setsGrid computingEstimation problemTypes of modelsGrid computationGrid methodPrivacyResponse modelCentralized modelMulti-center dataSuch decompositionsFit assessmentFitting methodLinear modelPerformance evaluationModel constructionData setsModel assumptionsIndividual observationsPractical solutionComputationResultsSimulation results
2014
Differentially private distributed logistic regression using private and public data
Ji Z, Jiang X, Wang S, Xiong L, Ohno-Machado L. Differentially private distributed logistic regression using private and public data. BMC Medical Genomics 2014, 7: s14. PMID: 25079786, PMCID: PMC4101668, DOI: 10.1186/1755-8794-7-s1-s14.Peer-Reviewed Original ResearchConceptsPrivate dataDifferential privacyPublic datasetsPublic dataRigorous privacy guaranteeData privacy researchPrivate data setsData mining modelsData setsProvable privacyPrivacy guaranteesMining modelPrivacy researchDifferent data setsArt frameworksMedical informaticsPrivacyAmount of noisePrivate methodsAuxiliary informationBetter utilityNew algorithmUpdate stepAvailable public dataAlgorithm
2012
Grid Binary LOgistic REgression (GLORE): building shared models without sharing data
Wu Y, Jiang X, Kim J, Ohno-Machado L. Grid Binary LOgistic REgression (GLORE): building shared models without sharing data. Journal Of The American Medical Informatics Association 2012, 19: 758-764. PMID: 22511014, PMCID: PMC3422844, DOI: 10.1136/amiajnl-2012-000862.Peer-Reviewed Original ResearchConceptsIntegrity of communicationCentralized data sourcesTraditional LR modelCentral repositoryComputational costData sourcesData setsSame formatPatient dataComputationGenomic dataRare patternRelevant dataLR modelPrediction valueSetRepositoryPartial elementsFormatClassificationCommunicationModelDataPatient setPerform
2011
Using statistical and machine learning to help institutions detect suspicious access to electronic health records
Boxwala A, Kim J, Grillo J, Ohno-Machado L. Using statistical and machine learning to help institutions detect suspicious access to electronic health records. Journal Of The American Medical Informatics Association 2011, 18: 498-505. PMID: 21672912, PMCID: PMC3128412, DOI: 10.1136/amiajnl-2011-000217.Peer-Reviewed Original ResearchConceptsSuspicious accessMachine-learning methodsPrivacy officersMachine learning techniquesVector machine modelAccess logsElectronic health recordsBaseline methodsAccess dataCross-validation setGold standard setSVM modelWhole data setMachine modelBaseline modelOrganizational dataHealth recordsData setsSVMSmooth isotonic regression: a new method to calibrate predictive models.
Jiang X, Osl M, Kim J, Ohno-Machado L. Smooth isotonic regression: a new method to calibrate predictive models. AMIA Joint Summits On Translational Science Proceedings 2011, 2011: 16-20. PMID: 22211175, PMCID: PMC3248752.Peer-Reviewed Original ResearchBiomedical data setsSupervised learning modelGood generalization abilityMachine learningPredictive modelGeneralization abilityProbabilistic outputsLearning modelData setsIsotonic regression methodNovel methodNon-parametric approachReliability diagramsProbability estimatesRegression methodNew methodLearning
2007
MODELING CANCER: INTEGRATION OF "OMICS" INFORMATION IN DYNAMIC SYSTEMS
STRANSKY B, BARRERA J, OHNO-MACHADO L, DE SOUZA S. MODELING CANCER: INTEGRATION OF "OMICS" INFORMATION IN DYNAMIC SYSTEMS. Journal Of Bioinformatics And Computational Biology 2007, 5: 977-986. PMID: 17787066, DOI: 10.1142/s0219720007002990.Peer-Reviewed Original Research
2005
Representation in stochastic search for phylogenetic tree reconstruction
Weber G, Ohno-Machado L, Shieber S. Representation in stochastic search for phylogenetic tree reconstruction. Journal Of Biomedical Informatics 2005, 39: 43-50. PMID: 16359929, DOI: 10.1016/j.jbi.2005.11.001.Peer-Reviewed Original Research
2004
A primer on gene expression and microarrays for machine learning researchers
Kuo W, Kim E, Trimarchi J, Jenssen T, Vinterbo S, Ohno-Machado L. A primer on gene expression and microarrays for machine learning researchers. Journal Of Biomedical Informatics 2004, 37: 293-303. PMID: 15465482, DOI: 10.1016/j.jbi.2004.07.002.Peer-Reviewed Reviews, Practice Guidelines, Standards, and Consensus StatementsConceptsNew algorithmSupervised learning modelUCI machineLearning modelMicroarray data analysisAlgorithmic developmentsTypes of dataMachineData setsMain challengesGene expression dataMain motivationAlgorithmData analysisBiomedical experimentsLarge numberExpression dataMicroarray dataResearchersRepositoryWebMicroarray experimentsNew waveDataSetMultivariate selection of genetic markers in diagnostic classification
Weber G, Vinterbo S, Ohno-Machado L. Multivariate selection of genetic markers in diagnostic classification. Artificial Intelligence In Medicine 2004, 31: 155-167. PMID: 15219292, DOI: 10.1016/j.artmed.2004.01.011.Peer-Reviewed Original ResearchConceptsClassification performanceBetter classification performanceLogistic regression algorithmUser-friendly implementationDifferent data setsSophisticated algorithmsRegression algorithmAlgorithmNew algorithmParticular classificationUnivariate algorithmsData setsGene expression dataClassificationNumber of variablesGene selectionSetInternetExpression dataNew setViable choiceMachinePerformanceImplementationSelectionPrediction of mortality in an Indian intensive care unit
Nimgaonkar A, Karnad D, Sudarshan S, Ohno-Machado L, Kohane I. Prediction of mortality in an Indian intensive care unit. Intensive Care Medicine 2004, 30: 248-253. PMID: 14727015, DOI: 10.1007/s00134-003-2105-4.Peer-Reviewed Original ResearchConceptsNeural networkIndian data setAPACHE IIArtificial neural network modelBack-propagation algorithmNeural network modelAnalysis of informationDay 1 APACHE II scoreIndian Intensive Care UnitsNetwork modelAPACHE II equationAPACHE II systemAPACHE II scoreIntensive care unitRisk of deathPrediction of mortalityNetworkHosmer-Lemeshow statisticData setsLogistic regression modelsHospital outcomesII scoreCare unitUniversity HospitalConsecutive admissions
2003
Stochastic Algorithms for Gene Expression Analysis
Ohno-Machado L, Kuo W. Stochastic Algorithms for Gene Expression Analysis. Lecture Notes In Computer Science 2003, 2827: 39-49. DOI: 10.1007/978-3-540-39816-5_4.Peer-Reviewed Original Research
2002
Disambiguation Data: Extracting Information from Anonymized Sources
Dreiseitl S, Vinterbo S, Ohno-Machado L. Disambiguation Data: Extracting Information from Anonymized Sources. Journal Of The American Medical Informatics Association 2002, 9: s110-s114. PMCID: PMC419432, DOI: 10.1197/jamia.m1240.Peer-Reviewed Original ResearchVisualization and evaluation of clusters for exploratory analysis of gene expression data
Kim J, Kohane I, Ohno-Machado L. Visualization and evaluation of clusters for exploratory analysis of gene expression data. Journal Of Biomedical Informatics 2002, 35: 25-36. PMID: 12415724, DOI: 10.1016/s1532-0464(02)00001-1.Peer-Reviewed Original ResearchConceptsClustering algorithmDifferent clustering algorithmsPopular clustering algorithmNew clustering algorithmComprehensive data visualizationGene expression data analysisData visualization strategiesExpression data analysisEvaluation of clustersData visualizationSoftware toolsCluster qualityCluster consistencyAlgorithmActual implementationData setsGene expression dataQuality measuresVisualizationPromising resultsFrameworkData analysisObjective evaluationUsersExpression data
2001
Disambiguation data: extracting information from anonymized sources.
Dreiseitl S, Vinterbo S, Ohno-Machado L. Disambiguation data: extracting information from anonymized sources. AMIA Annual Symposium Proceedings 2001, 144-8. PMID: 11825171, PMCID: PMC2243291.Peer-Reviewed Original Research
2000
Unsupervised learning from complex data: the matrix incision tree algorithm.
Kim J, Ohno-Machado L, Kohane I. Unsupervised learning from complex data: the matrix incision tree algorithm. Biocomputing 2000, 30-41. PMID: 11262950, DOI: 10.1142/9789814447362_0004.Peer-Reviewed Original ResearchConceptsHigh-dimensional spaceTree algorithmComplex high-dimensional spacesPredictive model buildingData setsLarge-scale gene expression dataLow-dimensional spaceKnowledge discoveryUnsupervised learningData structureComplex dataNovel methodMeaningful structuresMicroarray data setsDNA microarray data setsAlgorithm
1999
A genetic algorithm to select variables in logistic regression: example in the domain of myocardial infarction.
Vinterbo S, Ohno-Machado L. A genetic algorithm to select variables in logistic regression: example in the domain of myocardial infarction. AMIA Annual Symposium Proceedings 1999, 984-8. PMID: 10566508, PMCID: PMC2232877.Peer-Reviewed Original ResearchConceptsGenetic algorithmNumber of variablesVariable selection methodsGenetic algorithm variable selection methodSelection methodData setsAlgorithmVariable selectionBest variable combinationModel's discriminatory performanceModel simplicityActual useValidation setExternal validation setSetParticular selectionModel
1998
Improving machine learning performance by removing redundant cases in medical data sets.
Ohno-Machado L, Fraser H, Ohrn A. Improving machine learning performance by removing redundant cases in medical data sets. AMIA Annual Symposium Proceedings 1998, 523-7. PMID: 9929274, PMCID: PMC2232167.Peer-Reviewed Original Research