• SIFR project

    Semantic Indexing of French Biomedical Data Resources

     

    The SIFR project investigates the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data

  • Scientific CONTEXT

    The volume of data in biomedicine is constantly increasing. Despite a large adoption of English in science, a significant quantity of these data uses the French language. Biomedical data integration and semantic interoperability is necessary to enable new scientific discoveries that could be made by merging different available data. A key aspect to address those issues is the use of terminologies and ontologies as a common denominator to structure biomedical data and make them interoperable.

    Building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data

    The community has turned toward ontologies to design semantic indexes of data that leverage the medical knowledge for better information mining and retrieval. However, besides the existence of various English tools, there are considerably less ontologies available in French and there is a strong lack of related tools and services to exploit them. This lack does not match the huge amount of biomedical data produced in French, especially in the clinical world (e.g., electronic health records).

     

    The Semantic Indexing of French Biomedical Data Resources (SIFR) project proposes to investigate the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data. Our main goal is to enable straightforward use of ontologies freeing health researchers to deal with knowledge engineering issues and to concentrate on the biological and medical challenges.

     

    The SIFR project brings together several young researchers at LIRMM to achieve this objective. Dr. Clement Jonquet, assistant professor at University of Montpellier since 2010, coordinates the project and capitalize on a strong experience in the field acquired after a 3 year postdoc at Stanford. He is accompanied by 2 young researchers (HDR): Dr. Sandra Bringay and Dr. Mathieu Roche both expert in biomedical data/text mining. In addition, highly qualified and experienced partners are associated to the project: (i)°Stanford BMIR, a worldwide leader providing (English-)ontology-based services to assist health professionals and researchers in the use of ontologies to design biomedical knowledge-based systems; (ii)°The TETIS group, a joint applied research unit (AgroParisTech, Irstea, Cirad) specialized in geographic information, environment and agriculture. (iii)°the Computational Biology Institute (IBC) of Montpellier.

  • Ontology-based indexing workflow (i.e., French Annotator) similar to what exists for English resources but dedicated and specialized for French

    http://bioportal.lirmm.fr/annotator

    This service is available within a portal of ~25 French biomedical ontologies/terminologies which reuse the BioPortal technology, developed at Stanford University. Ontologies has been offered by the CISMEF group from Rouen University Hospital, or taken from the UMLS, or directly uploaded by users. The SIFR BioPortal has been released in June 2015.

    Need of automated annotation methods

    http://bioportal.lirmm.fr/annotator

    Researchers have called for the need of automated annotation methods and for leveraging natural language processing tools in the curation process. Still, even if the issue is being currently addressed for English, French is not in the same situation: there is little readily available technology (i.e.,“off-the-shelf” technology) that allows the use of ontologies uniformly in various annotation and curation pipelines with minimal effort.

  • Within the project, we work on several research questions from semantic indexing, text mining, terminology extraction, ontology enrichment, disambiguation, multilingualism in ontologies and semantic annotation in order to offer the community with services and applications capable of leveraging the use of biomedical ontologies in their data workflows. For instance, in order to extract specialized terminology from free texts in French, our approaches are based on new ranking functions that combine statistical and linguistic methods for highlighting relevant terms. Then we offer a complete methodology to identify (non)polysomic terms and choose the appropriate attachment in an already existing ontology. As another example, we develop a new agent-centered graph-based knowledge representation approach that enables to merge formal data representation (e.g. from the semantic Web) with informal users’ contributions (from the social Web) and reveal relevant semantic paths between resources.

     

    We plan to capitalize upon the work already accomplished in the last 16 years in France, however, SIFR enables the emergence of new research domains and applications at LIRMM and materialize an important international collaboration with Stanford BMIR. SIFR will offer the French biomedical community (e.g., clinicians, health professionals, researchers) highly valuable ontology-based indexing services that will enhance their data production and consumption workflows. In addition, the results of the project are not limited to French (also include English, Spanish) and we are also transferring our results in the agronomic domain in the context of the new AgroPortal project (http://agroportal.lirmm.fr). The project will put France in a key position to lead future European projects related to multilingual data issues in biomedicine and other domains.

  • PARTNERS

  • Publications

    64 scientific communications

    All project communications are uploaded to HAL.

    7 international journal, 2 national, 29 international conferences or workshops (such as ISWC, IDEAS, MIE, KEOD, MEDINFO)

  • Featured publications

    • Andon Tchechmedjiev, Amine Abdaoui, Vincent Emonet & Clement Jonquet. ICD10 Coding of Death Certificates with the NCBO and SIFR Annotator(s) at CLEF eHealth 2017 Task 1, In Working Notes of CLEF eHealth Evaluation Lab. Dublin, Ireland, September 2017. CEUR Workshop Proceedings, Vol. 1866 pp. 16. [PDF] [RelatedLink]
    • Clement Jonquet, Amina Annane, Khedidja Bouarech, Vincent Emonet & Soumia Melzi. SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminologies biomédicales françaises au service de l’annotation sémantique, In 16th Journées Francophones d'Informatique Médicale, JFIM'16. Genève, Suisse, July 2016. pp. 16. [PDF] [RelatedLink]
    • Amina Annane, Zohra Bellahsene, Faical Azouaou & Clement Jonquet. Selection and Combination of Heterogeneous BK to Enhance Biomedical Ontology Matching, In 20th International Conference on Knowledge Engineering and Knowledge Management, EKAW'16. Bologna, Italy, November 2016. Lecture Notes in Artificial Intelligence, Vol. 10024 pp. 19-33. Springer.

      [DOI] [PDF] [RelatedLink]

    • Amina Annane, Vincent Emonet, Faical Azouaou & Clement Jonquet. Multilingual Mapping Reconciliation between English-French Biomedical Ontologies, In 6th International Conference on Web Intelligence, Mining and Semantics, WIMS'16. Nimes, France, June 2016. (13), pp. 12. ACM. [DOI] [PDF] [RelatedLink]

    • Juan Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche & Maguelonne Teisseire. Automatic Biomedical Term Polysemy Detection, In 10th International Conference on Language Resources and Evaluation, LREC'16. Portoroz, Slovenia, May 2016. pp. 23-28. European Language Resources Association. [PDF] [RelatedLink]

    • Juan-Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche & Maguelonne Teisseire. Biomedical term extraction: overview and a new methodologyInformation Retrieval, Special issue on Medical Information Retrieval. August 2015. Vol. 19 (1), pp. 59-99. Springer. [DOI] [PDF] [RelatedLink]
    • Mike Donald Tapi-Nzali, Sandra Bringay, Christian Lavergne, Thomas Opitz, Jérôme Azé, and Caroline Mollevi. Construction d’un vocabulaire patient/médecin dédié au cancer du sein à partir des médias sociaux. In 26èmes Journées Francophones d’Ingénierie des Connaissances, IC’15, page 12, Rennes, France, June 2015.
    • Clement Jonquet, Esther Dzalé-Yeumo, Elizabeth Arnaud & Pierre Larmande. AgroPortal: a proposition for ontology-based services in the agronomic domain, In 3ème atelier INtégration de sources/masses de données hétérogènes et Ontologies, dans le domaine des sciences du VIVant et de l’Environnement, IN-OVIVE'15. Rennes, France, June 2015. pp. 5. [PDF] [RelatedLink]
    • Guillaume Surroca, Philippe Lemoisson, Clement Jonquet & Stefano A. Cerri. Preference Dissemination by Sharing Viewpoints : Simulating Serendipity, In 7th Intertnational Conference on Knowledge Engineering and Ontology Development KEOD'15. Lisbon, Portugal, November 2015. Vol. 2 (2), pp. 402-409. [PDF] [RelatedLink]
    • Juan Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche & Maguelonne Teisseire. BIOTEX: A system for Biomedical Terminology Extraction, Ranking, and Validation, In 13th International Semantic Web Conference, Demonstration, ISWC'14. Riva del Garda, Italy, October 2014. CEUR Workshop Proceedings, Vol. 1272 pp. 157-160. [PDF] [RelatedLink]
  • Project Outcomes

    SIFR Annotator

    Design, development and deployment of the French Annotator

    A publicly accessible ontology-based annotation tool to process French biomedical text data. A service that for a given piece of text will return biomedical ontology concepts directly mentioned in the text or semantically expanded.

    OTHER RESEARCH

    Obtain new research results to exploit and enhance ontology-based indexing services

    • We achieved an exhaustive comparison of CISMeF HMTP and NCBO BioPortal, including the comparison of the annotation workflow and made CISMEF terminologies exportable.
    • We work on multilingual mappings reconciliation and creation between French and English biomedical ontologies/terminologies.
    • We work on automatic detection of emotion on public heath forums using text mining techniques. And we are building a patient vocabulary out of public patient-written resources.
    • We work on improving biomedical ontology alignment using background knowledge.
    • We work on formalizing consumer health vocabularies (cf. MuEVO)

    DEVELOPMENT

    24+ repositories on GitHub, 18 contributors

    We actively reuse the technology of the US National Center for Biomedical Ontology and develop new features and software for our research.

  • SIFR BioPortal

    An open platform to host French biomedical ontologies and terminologies based on the technology developed by the National Center for Biomedical Ontology

  • BioTex

    Methodology and tool for automatic extraction of biomedical terms from plain text

    Using existing extraction methods (e.g., C-Value) as well as keyword based indexing methods (e.g., Okapi, Tf-Idf) usually employed in information retrieval

    NCBO Annotator+

    Enhanced functionalities for annotating and indexing clinical text with the NCBO Annotator+

    A web service which incorporates new functionalities within the NCBO Annotator

    AgroPortal

    A vocabulary and ontology repository for agronomy

    We kicked-off the AgroPortal project and platform which goals is to offer a reference ontology repository for agronomic, plant sciences, nutrition and biodiversity

    ViewpointS

    A formalism for subjective knowledge

    We work on semantic indexing and knowledge representation with the goal of capturing formal data and informal contributions into an evolutionary knowledge graph

  • Some figures

    49 scientific publications, 2 PhD thesis, 1 cotuelle, 2 postdocs, 6 master interns, 2 years of developer, 5 conferences, 1 mobility project

  • Prices & distinctions

    • MSCA fellowship for C. Jonquet's mobility project
    • Eiffel fellowship for A. Annane's cotutelle
    • France-Stanford fellowship for A. Annane's mobility project
    • 1st Prize at the 2nd BD2K 4th Network of BioThings Hackathon 2015 for the NCBO team (with C. Jonquet)
    • Shared best paper award at JFO 2016 for our work on ontology metadata (A. Toulet & C. Jonquet)
    • Young researcher best paper award at IC 2015 for our work on patient vocabulary (M. Tapi-Nzali & S. Bringay)
    • News in "My Little Santé" (M. Tapi-Nzali & S. Bringay)
    • Region LR Young researcher prize for M. Roche
  • Team

    Over the 6 years of the project (2013-2019), the team included:

    Clement Jonquet

    Principal Investigator

    Assistant Professor (LIRMM). Leader of the project. Selected as "Young Researcher" by French ANR and Marie Curie Fellow by H2020

    Mathieu Roche

    Co-PI (young researcher)

    Researcher (CIRAD)

    Sandra Bringay

    Co-PI (young researcher)

    Professor (LIRMM)

    Vincent Emonet

    Developer

    (funded by ANR SIFR)

    Juan Antonio Lossio Ventura

    PhD student

    (funded by Univ. of Montpellier)

    Guillaume Surroca

    PhD student

    (funded by ANR SIFR)

    Amina Annane

    PhD student

    Collaboration with ESI (Algiers)

    Amine Abdaoui

    Postdoc

    (funded by ANR PractikPharma)

    Andon Tchechmedjiev

    Postdoc

    (funded by ANR PractikPharma)

  • Advisors

    Stefano A. Cerri
    Maguelonne Teisseire
    Pascal Poncelet
    Zohra Bellahsene

    Collaborators

    Anne Toulet (LIRMM)

    Philippe Lemoisson (TETIS)
    Pierre Larmande (IRD / IBC)
    Mark Musen (BMIR / NCBO)
    John Graybeal (NCBO)
    Stefan Darmoni (CISMEF)
    Sebastien Harispe (LGI2P)

     

     

    Students

    Mike Tapi-Nzali

    Stella Zevio

    Soumia Melzi

    Kevin Cauchois

    Khadidja Bouarech

    Solène Eholié

    Alexandre Lerbet

    Chafik EL Ghandour

    Mohamed Serhani

    Olivier Duplouy

    Pierre Burc

     

    Other helpers

    Julien Diener

    Sebastien Harispe

     

     

  • Dissemination

    Events

    Conferences, hackathons, meetup, tutorial, etc.

    Presentations

    Multiple presentations uploaded on Slideshare, including

    Working group activities

    • RDA Wheat Interoperability Working Group
    • RDA AgriSemantics Working Group
    • RDA Vocabulary and Semantic Services Working Group
    • French GDR SemanDiv
    • AgBioDatabases Working Group
  • Positions

    Closed positions

    PhD fellowship

    • Semantic portals interoperability and multilingual data integration in biomedicine [PDF]

    Master Intern

    • Développement d’une application web d’extraction et prédiction de métadonnées pour des ontologies [PDF]
    • Développement d’un Web service parseur de documents structurés en Java [PDF]
    • Conception d’un prototype d’annotateur sémantique biomédical francophone [PDF] [PDF]
    • Multilingualism in an ontology repository: the case of BioPortal [PDF]
    • BioTex dans BioPortal - Extension d’une application web sémantique JAVA/Ruby On Rails [PDF]
    • Extraction et réconciliation d’alignements multi-langue dans des ontologies biomédicales [PDF]
    • Conception et implémentation d’une application web et d’un web service pour des distances sémantiques [PDF]
    • Design and realization of a biomedical-ontology-based semantic distances Web service [PDF]
    • VWA (Viewpoints Web App) : développement d’une application web 3‐tiers pour un graphe de points de vue [PDF]
    • Construction d’une application web pour la navigation au sein d’un graphe de connaissances communautaires [PDF]
    • Expérimentation du calcul hétérogène dans l'approche ViewpointS [PDF]
    • Formalisation et extraction de points de vue de à partir de multiples ressources du Web [PDF]
    • Modélisation et alignement de modèles de données biologiques avec les technologies du web sémantique [PDF]
    • Comparison and convergence of biomedical ontology repositories platforms: BioPortal vs. CISMEF [PDF]
  • CONTACT

    LIRMM
    161 Rue Ada 34095 Montpelllier
    France