Part Two: Ideometry -- A New
Way of Knowing
In the previous part of this article, I argued that
the Internet can be understood as a stage in the life cycle of the "Human
Encyclopedia." As such, the Internet has already given rise to
unprecedented innovations and to new fundamental problems, some of which
are especially relevant to the future of scholarship and organized
knowledge. In this part, we begin to examine these by developing the
concept of ideometry.
The New Nature of Scholarship --
When considering the innovations that the Internet has brought to the field
of the production and management of organized knowledge, one might think of
the reduction of the time-lag between the production and the utilization of
knowledge, the promotion of international cooperation and sharing of
information among researchers and scholars, or the possibility of remote
teaching online. Yet most such novelties are actually less radical than
they seem, since they mainly make easier and quicker what we used to do
anyway.
There are other possibilities, however, which do represent a more
radical break with the past. For example, the global network is
weakening
the concept of specialization. The book era, providing a rigidly structured
context, invited specialization. Especially the humanities became
topic-oriented. The electronic Encyclopedia, on
the other hand, promotes
inter-disciplinary work, i.e. diatopic approaches. In fact, it's difficult
to restrict oneself always to the same limited space when one can navigate
so easily to and fro across the disciplinary boundaries.
Now, the most substantial of the radical innovations concerns our
ability to acquire ever-more-easily further knowledge about the
Encyclopedia itself. Consider once again the intellectual space of
organized knowledge. We can distinguish between three different
dimensions:
- Primary data.
This is what we usually perceive as the Encyclopedia per se, the
principal information we can acquire when we have access to the
encyclopedia, and it is also the information the encyclopedia is generally
designed to convey to the user in the first place.
- Metadata.
These are the secondary indications about
the nature of the data sets constituting the first dimension. Here we can
find information, for example, about copyright restrictions, about the
collocation of our data sets in a physical library or in a virtual domain,
about the subject covered by the data sets, about the quality of the
information conveyed, and so forth. You can think of metadata as library
records.
- Derivative data.
These are data that can be extracted
from
primary data sets, when the latter are used as a source for
comparative and quantitative analysis. This requires a lengthier
explanation.
What Derivative Data Is --
In the book age, primary data sets were collected and organized in
structures which were necessarily rigid and unalterable. The ordering
principles behind this organization actually limited the range of primary
questions which could meaningfully be asked. For example, if the ordering
principle stated that the primary data should be all the poetic texts of
any time written in English, the final edition in several volumes of all
English poems provided the means to answer properly and easily only a
limited range of primary questions, like "who wrote what when."
Information Technology has transformed all this. It is now possible to
query the digital domain and shape it according to principles which are
completely different from those whereby the primary data were initially
collected and organized. The structure of our particular set of digital
data can be modified to fit an infinite number of requirements, and hence
provide answers to secondary questions which were not meant to be answered
by the original structure. The new patterns that emerge from the
application of quantitative and comparative queries may turn out to be
meaningful and interesting for reasons that are completely extraneous to
the initial ordering principle.
What Ideometry Is --
Ideometry is the study of the significant patterns resulting from a
comparative and quantitative analysis of the field of knowledge -- that is,
of the clusters of primary data like data banks, textual corpora, or
multimedia archives. Derivative data, the third dimension of the
Encyclopedia, are the outcome of an ideometric analysis of whatever sector
of organized knowledge has been subject to investigation.
An example will clarify the notions of ideometry and derivative
data
together. In 1994 Chadwick-Healey published a database of English Poetry on
CD-ROMs. The structure of this digital collection is thoroughly flexible,
and we can reorganize it at will. As a simple example, we might wish to
study the presence or absence of the two popular figures -- Heraclitus, the
weeping philosopher, and Democritus, the laughing philosopher -- through
the entire set of documents.
A quick computer survey shows that the joint motif of compassion
for
human misfortune and derision of human ambitions was very popular between
the second half of the sixteenth and the first half of the seventeenth
century, as it is in this period that we find most of the poets using the
philosophical couple as a literary device. This pattern becomes even more
interesting once we notice that during the seventeenth century the two
Greek philosophers were portrayed in many Dutch paintings. Through a
quantitative and comparative analysis (an ideometric analysis) we have made
the encyclopedia speak about itself (supply us with derivative
data).
Ideometry and The Internet --
Now, to some extent this too is nothing so very new. Ideometry has been
popular in many disciplines since the 1960s. Lexicography, stylometry,
prosopography, citation analysis, bibliometric studies,
econometrics, and
quantitative history have all used forms of ideometric analysis for
investigation. But scholars could perform ideometric analysis only on a
limited scale and with enormous efforts. The trouble was, quite simply,
that Information Technology was not yet up to scholarly expectations and
needs. It wasn't that the Humanities were not sufficiently "scientific" to
allow the application of Information Technology tools, but rather that
Information Technology was too primitive to be of any real service for the
highly sophisticated tasks required by scholarly research.
The radical change brought about by the present age of Information
Technology and the Internet is that an ideometric approach is becoming an
increasingly easy option for any researcher. It is obvious that primary
data need metadata in order to be manageable, so the second dimension of
the encyclopedia can never be really separate from the first. Derivative
data, however, are not so directly available, and the third dimension
emerges only when large amounts of primary data are collected in digital
form, are made easily accessible to the user, and can be rapidly queried
and thus re-structured via electronic tools. Today all these conditions are
being more and more adequately fulfilled by the Internet.
An Electronic Book Is Not A Book! --
Ideometry shows that digital texts, though they maintain some of the basic
features of printed books and can therefore be used as surrogates, should
not be understood as if they were meant to fulfil the same task. We do not
convert printed texts into electronic databases in order to read them
better or more comfortably. For this task the book is and will remain
unsurpassed.
But we do not spend so much money only to create big electronic
indexes either. Rather, we collect and digitize large corpora of texts in
order to subject them to comparative and quantitative analysis and extract
knowledge they contain only on a macroscopic level. What is revolutionary
in an electronic bibliography, for example, is not that I can find a
certain book in a few seconds, which is trivial, but that I can ask new
questions: I can check when books on the history of Analytic Philosophy
started to be written, for example, and discover how their number increased
while the movement became more and more scholastic.
Thus, corpora of electronic texts and multimedia sources are the
laboratory for ideometric analysis. And (this is where the Internet comes
in) the larger and more accessible the domain, the better it will be, for
the ideometric value of an extensive corpus is given by the product rather
than by the simple arithmetical sum of the ideometric value of each single
document. Once simple and economical tools for studying visual and
acoustic patterns also become available, ideometric analyses will be
extended to the entire domain of the enlarged Encyclopedia.