What is the Data Science Ontology?

The Data Science Ontology (DSO) is a knowledge base about data science with a focus on computer programming. The concepts of the ontology are drawn from statistics, machine learning, and the practice of software engineering for data science. Besides cataloging and organizing data science concepts, the ontology provides semantic annotations of commonly used software libraries for data science, such as pandas , scikit-learn , and statsmodels . Annotations map the libraries' types and functions onto the ontology's universal concepts.

Mission

The purpose of the Data Science Ontology is to enable artificial intelligence (AI) capabilities for data science, such as:

semantic queries on data analyses
comparison of semantic similarity between data analyses
automated statistical meta-analysis
meta-learning for machine learning

Several of these capabilities are currently under development by the Data Science Ontology team.

Learning more

If you'd like to learn more about the Data Science Ontology, keep reading along one of these tracks:

Introduction to the Data Science Ontology
After reading this informal introduction, you will understand the basics of the ontology language and know how to interpret the concept and annotation entries found on this website.
Contributing to the Data Science Ontology
This document is a guide to the internal data format for concepts and annotations. After reading it, you will know how to contribute new concepts and annotations to the ontology.
Mathematics of the Data Science Ontology
For the mathematically inclined or merely curious, this document explains the category-theoretic underpinnings of the ontology language and its connection to the typed lambda calculus. References for further reading are provided. Users and contributors need not understand this material, but it is recommended for researchers who wish to extend the ontology language or the program analysis tools.

You can also read answers to frequently asked questions.