Visibility of knowledge

Project Title: Visibility of knowledge

Project Investigators:

Partner:

Project Funder:

Description:

What problem are we solving?

Bringing together a team of humanists and computer scientists, we are interested in understanding how visual techniques, such as the use of footnotes, diagrams, tables and mimetic representations of objects, were used to engage the public and make new ideas accessible. Rather than focus exclusively on language, we want to know more about the different cultures of scientific representation that accompanied this writing.

Large scale analysis of data helps to answer some questions such as:

How general and universal was the appearance of scientific knowledge during past centuries?

What is the relation between the different practices of scientific illustration and different fields of science?

Any mainstream practice? why?

The impact will be not only the new understanding of the past but also starting a new way and resources to be used by others.

Datasets

The aim of this project is to use pattern recognition and machine learning techniques to study the appearance of scientific knowledge throughout the years. Two datasets were used for the purpose of the study.

Eighteen Century Collections Online (ECCO) consists of approximately 150,000 manuscripts and around 32 million page images published between 1700-1800 AD. The second dataset is periodicals from the National Academy of Science (NAS) which consists of OS 800 manuscripts and 0.6 million images.

For ECCO three separate datasets are created, 27,000 for containing footnote or not, around 22,000 for containing Table or not and 10,000 for containing footnote, Table and/or Illustration

Approximately 20,000 images from the NAS dataset are labels for containing or being footnote, table, and/or illustration as these visual features might co-appear in a page.

These labeled datasets were used along with pattern extraction and machine learning methods for accurate and automatic detection of visual features. The variable nature of these features and also large scale nature of data were the two main challenges of this project and we tackled the problem with different approaches for a footnote, table, and illustration detection.

The footnote detection of ECCO, data, training data is published along with paper so the results can be reproduced, used, and analyzed along with metadata for further investigation from the historical perspective.

The first impact of this work is using machine learning methods to investigate many hypotheses form the human science perspective and the second and more important is getting pattern recognition for large scale datasets. Another main impact is building a bridge between humanistic perspective and computer science techniques so the hypothesis can be tested in a large scale.