CM Magazine: The Internet & the Future of Organized Knowledge

________________ CM . . . . Volume I Number VI . . . . July 21, 1995

The Internet & the Future of Organized Knowledge: Part II of III
Luciano Floridi

[Note: we thank Professor Floridi mailto:floridi@vax.ox.ac.uk for kind permission to reprint this material, which is a shortened version of a paper he gave at a UNESCO Conference in Paris, March 14-17, 1995. Part I was published in last week's Canadian Materials; the final portion will appear next week.]

Part Two: Ideometry -- A New Way of Knowing

In the previous part of this article, I argued that the Internet can be understood as a stage in the life cycle of the "Human Encyclopedia." As such, the Internet has already given rise to unprecedented innovations and to new fundamental problems, some of which are especially relevant to the future of scholarship and organized knowledge. In this part, we begin to examine these by developing the concept of ideometry.

The New Nature of Scholarship --

When considering the innovations that the Internet has brought to the field of the production and management of organized knowledge, one might think of the reduction of the time-lag between the production and the utilization of knowledge, the promotion of international cooperation and sharing of information among researchers and scholars, or the possibility of remote teaching online. Yet most such novelties are actually less radical than they seem, since they mainly make easier and quicker what we used to do anyway.

There are other possibilities, however, which do represent a more radical break with the past. For example, the global network is weakening the concept of specialization. The book era, providing a rigidly structured context, invited specialization. Especially the humanities became topic-oriented. The electronic Encyclopedia, on the other hand, promotes inter-disciplinary work, i.e. diatopic approaches. In fact, it's difficult to restrict oneself always to the same limited space when one can navigate so easily to and fro across the disciplinary boundaries.

Now, the most substantial of the radical innovations concerns our ability to acquire ever-more-easily further knowledge about the Encyclopedia itself. Consider once again the intellectual space of organized knowledge. We can distinguish between three different dimensions:

Primary data.
This is what we usually perceive as the Encyclopedia per se, the principal information we can acquire when we have access to the encyclopedia, and it is also the information the encyclopedia is generally designed to convey to the user in the first place.
Metadata.
These are the secondary indications about the nature of the data sets constituting the first dimension. Here we can find information, for example, about copyright restrictions, about the collocation of our data sets in a physical library or in a virtual domain, about the subject covered by the data sets, about the quality of the information conveyed, and so forth. You can think of metadata as library records.
Derivative data.
These are data that can be extracted from primary data sets, when the latter are used as a source for comparative and quantitative analysis. This requires a lengthier explanation.

What Derivative Data Is --

In the book age, primary data sets were collected and organized in structures which were necessarily rigid and unalterable. The ordering principles behind this organization actually limited the range of primary questions which could meaningfully be asked. For example, if the ordering principle stated that the primary data should be all the poetic texts of any time written in English, the final edition in several volumes of all English poems provided the means to answer properly and easily only a limited range of primary questions, like "who wrote what when."

Information Technology has transformed all this. It is now possible to query the digital domain and shape it according to principles which are completely different from those whereby the primary data were initially collected and organized. The structure of our particular set of digital data can be modified to fit an infinite number of requirements, and hence provide answers to secondary questions which were not meant to be answered by the original structure. The new patterns that emerge from the application of quantitative and comparative queries may turn out to be meaningful and interesting for reasons that are completely extraneous to the initial ordering principle.

What Ideometry Is --

Ideometry is the study of the significant patterns resulting from a comparative and quantitative analysis of the field of knowledge -- that is, of the clusters of primary data like data banks, textual corpora, or multimedia archives. Derivative data, the third dimension of the Encyclopedia, are the outcome of an ideometric analysis of whatever sector of organized knowledge has been subject to investigation.

An example will clarify the notions of ideometry and derivative data together. In 1994 Chadwick-Healey published a database of English Poetry on CD-ROMs. The structure of this digital collection is thoroughly flexible, and we can reorganize it at will. As a simple example, we might wish to study the presence or absence of the two popular figures -- Heraclitus, the weeping philosopher, and Democritus, the laughing philosopher -- through the entire set of documents.

A quick computer survey shows that the joint motif of compassion for human misfortune and derision of human ambitions was very popular between the second half of the sixteenth and the first half of the seventeenth century, as it is in this period that we find most of the poets using the philosophical couple as a literary device. This pattern becomes even more interesting once we notice that during the seventeenth century the two Greek philosophers were portrayed in many Dutch paintings. Through a quantitative and comparative analysis (an ideometric analysis) we have made the encyclopedia speak about itself (supply us with derivative data).

Ideometry and The Internet --

Now, to some extent this too is nothing so very new. Ideometry has been popular in many disciplines since the 1960s. Lexicography, stylometry, prosopography, citation analysis, bibliometric studies, econometrics, and quantitative history have all used forms of ideometric analysis for investigation. But scholars could perform ideometric analysis only on a limited scale and with enormous efforts. The trouble was, quite simply, that Information Technology was not yet up to scholarly expectations and needs. It wasn't that the Humanities were not sufficiently "scientific" to allow the application of Information Technology tools, but rather that Information Technology was too primitive to be of any real service for the highly sophisticated tasks required by scholarly research.

The radical change brought about by the present age of Information Technology and the Internet is that an ideometric approach is becoming an increasingly easy option for any researcher. It is obvious that primary data need metadata in order to be manageable, so the second dimension of the encyclopedia can never be really separate from the first. Derivative data, however, are not so directly available, and the third dimension emerges only when large amounts of primary data are collected in digital form, are made easily accessible to the user, and can be rapidly queried and thus re-structured via electronic tools. Today all these conditions are being more and more adequately fulfilled by the Internet.

An Electronic Book Is Not A Book! --

Ideometry shows that digital texts, though they maintain some of the basic features of printed books and can therefore be used as surrogates, should not be understood as if they were meant to fulfil the same task. We do not convert printed texts into electronic databases in order to read them better or more comfortably. For this task the book is and will remain unsurpassed.

But we do not spend so much money only to create big electronic indexes either. Rather, we collect and digitize large corpora of texts in order to subject them to comparative and quantitative analysis and extract knowledge they contain only on a macroscopic level. What is revolutionary in an electronic bibliography, for example, is not that I can find a certain book in a few seconds, which is trivial, but that I can ask new questions: I can check when books on the history of Analytic Philosophy started to be written, for example, and discover how their number increased while the movement became more and more scholastic.

Thus, corpora of electronic texts and multimedia sources are the laboratory for ideometric analysis. And (this is where the Internet comes in) the larger and more accessible the domain, the better it will be, for the ideometric value of an extensive corpus is given by the product rather than by the simple arithmetical sum of the ideometric value of each single document. Once simple and economical tools for studying visual and acoustic patterns also become available, ideometric analyses will be extended to the entire domain of the enlarged Encyclopedia.

Thus, electronic collections of data and the Internet have raised the level on which we can deal with our data. But the Internet has also raised severe problems for scholarship; I shall talk about these in the third part of this article.

Reprinted with permission from the electronic journal TidBITS, #282. Email info@tidbits.com for more information.

To comment on this title or this review, send mail to cm@umanitoba.ca.

Published by
The Manitoba Library Association
ISSN 1201-9364

TABLE OF CONTENTS FOR THIS ISSUE - JULY 21, 1995.