Thursday, November 1, 2007

Semiotic Domain Models

Introduction to Semiotic Domain Models


Traditionally, when one wants to build a system to organize information and process it in a manner consistent with a given discipline, one must first develop a domain model. The domain model consists of a data model, a set of data structures used to organize the information in a meaningful way, and a set of behaviors associated with the data structures within that data model. Object-oriented programming provides a familiar paradigm for associating behaviors with structured data. For example, a circle object may consist of a Segment denoting its radius, and a Point denoting its center. Behaviors associated with the circle may include calculating circumference, area, or even changing the radius or point. Another way of thinking of these data structures and associated behaviors is as a set of terms and rules, where terms correspond with the properties of the object and rules with the behaviors describing how to operate on that object.

Each of the data structures (which may be an object in its own right) often corresponds to a term from a given discipline, in fact a recommended best practice is to develop a glossary of such terms before creating the domain objects. The behaviors for processing and manipulating these objects often encode a set of business rules consistent with the operations performed upon these domain objects. What is interesting is that the semantic interpretation of these terms and behaviors is left to those people using the system. The meaning of the objects like Segment and Point is encoded for use by the programmer in the API documentation, and for the manager in the glossary of terms, but has been underspecified for another type of user of these systems, machines. Unable to encode meaning in a machine actionable way means that such 'formal models' of domains are unable to automatically re-evaluate objects with respect to new, changing meanings that may occur during processing. In such models, called 'formal models', each term has a fixed and unique meaning. However, depending upon the application, it may be desirable to have a machine re-interpret an object with respect to a new meaning. Semiotic models extend formal models by adding an additional 'semantic layer', mapping the terms and symbols used in formal models to their meaning. "Differently from the logic-linguistic models developed in the West, terms and rules were not just ungrounded symbols building purely syntactical systems. The formalization of SSC (semiotic situational control) took into account sophistications like the grounding of linguistic terms and rules (its semantics)." (Towards an Introduction to Computational Semiotics)

When trying to model phenomena in which the meaning of these domain objects changes and impacts the behaviors of those objects, semiotic systems become necessary. A very simple example occurs in geometric construction, a reasoning process in which geometric primitives are drawn upon the page and constantly reinterpreted with respect to newly inferred knowledge. For example, the first construction of the Planisphere logically references the presentational point e as a north pole, and the presentational circle abgd as a sphere. However, later on in the same construction Ptolemy associates the presentational point e with a Euclidean point, and abgd with a Euclidean circle, changing the logical reference for those presentational primitives at that point in the construction. The power of semiotic systems over formal systems is that they allow one to explore the consequences of interpreting a given symbol (or presentational primitive) as a certain logical concept (meaning). Furthermore, the user (including machines) can operate upon these symbols in a manner consistent with the logical constraints associated with the interpretation of that symbol. For example, if an intersection point is logically referenced within a construction, then whenever that construction is redrawn (by man or machine), that point must result from an intersection of those two lines. Without this logical constraint (arising from mapping meaning in the form of a logical model to the symbols on the page), there would not be enough information for the machine to automatically generate a different presentation of the same logical construction process. (Step 13, edition 1), (Step 13, edition 2) (Step 13, multiple editions) The ability to generate different presentations of the same process is seen in transmission of such diagrams such as those by Heiberg and Drecker for the second construction of the Planisphere. (Heiberg's Diagram) (Drecker's Diagram)

Relevance of Semiotics for Intelligent Systems


Intelligent systems like humans associate meaning with signs, they create symbols. Therefore a reasonable feature of any intelligent system modeled after humans should have some mechanism for operating upon some unit of information with respect to a particular semantic 'interpretation'. When looking at a sign on the page, the meaning of that sign, and even the proper way to recognize that sign is ambiguous without some logical context or information. An open contest problem at GREC 2007 required developing an algorithm to segment a binary image of arcs. Assuming that such a difficult problem can be reasonably solved, the utility of such a solution could be further increased (and perhaps difficulty decreased) if the problem was augmented with a glossary of logical terms and their corresponding presentational conventions representing them within the diagram. In this manner, segmentations could be pruned according to their consistency with respect to diagramming conventions. The resultant segmentations could also be understood by man and machine with respect to their underlying logical semantics. This is just one example in which logical context is needed to resolve presentational ambiguity. If one draws an arrangement of lines on a piece of paper, there is no way of knowing whether this arrangement represents a work of art or the construction for an astrolabe or both. There needs to be a way to associate domain objects from art or astronomy with the domain objects from the presentational geometric model seen on the page. In order for machines to produce, navigate, and alter these diagrams on a symbolic level, there needs to be a mechanism for encoding symbols. A symbol is an association between a logical domain object and a presentational domain object. More specifically, a symbol can be encoded as a mapping from a logical domain object to a presentational domain object, where the mapping encodes the act of interpreting the symbol. This definition of symbol directly corresponds to its Greek root σύμβaλλω meaning to throw together, to reckon, compute, to interpret, to agree upon. Through agreeing upon a mapping that throws together two domain objects, meaning is created. I would argue that this symbol-making process is central to reasoning in which new meaning is repeatedly inferred from previous information.

It is worth noting that two different symbols can serve the same purpose. Using another example from Euclidean geometric diagrams, labels are often used to encode the association from logical concept to presentational primitive within the diagram. Examples of these can be seen in the diagrams by Heiberg and Drecker for the second construction of the Planisphere. However, other symbols such as colored shapes can serve the same purpose as these alphabetic labels. Byrne's edition of Euclid assigns meaning to such presentational primitives within the diagram by literally putting them within the logical context of the text. Although diagrammatic information like that seen in Ptolemy and Euclid has traditionally been visualized using a two-dimensional geometric model, there is no reason why this same information may not be able to be visualized using a three-dimensional model, a haptic visualization model, or any type of visualization rendered into a continuous space with a notion of Euclidean distance. The properties of the presentation or representation space can be used to inform the construction of a logical argument if the representation space captures the logical properties one wishes to explore. One of the things which makes diagrams so valuable as a reasoning tool is their variation in presentation. Diagrams allow one to literally look at a geometric model with stated properties from a new perspective. With each variation spatial intuition is further developed. Seeing how the diagram's referenced primitives relate to each other spatially informs one's understanding of how they relate mathematically, a property intensified when studying geometry using Euclidean distance.

Diagrams allow one to recruit one's spatial specialists, parts of the brain that process spatial information, to inform the reasoning process. The notion of different specialists interpreting information in different ways and providing different outputs may be a useful presentational model for understanding knowledge creation during the reasoning process. In Beal's model, the brain consists of different 'specialists' that learn to communicate with each other by finding similarities between their 'interpretations' of a common signal. ((Learning By Learning to Communicate). The properties of a signal that survive translation between these specialists are seen as likely to represent something 'real' about the world, rather than a fluke of one's specialists's processing. If one thinks of these specialists as domain specialists, each processing a signal relative to a specific domain model, then those concepts which can be mapped from the domain model of one specialist into the domain model of another specialist, are more likely to represent real knowledge. Beal argues that this translation process, that the struggle of specialists to communicate, teases out new information. This argument is consistent with translation in general. The process of mapping the logical concepts presented by the Ancient Greek text into English clarifies the nature of the logical model. A similar process is used when mapping the logical concepts of that same Greek text into a diagrammatic presentation. Both presentational mediums provide information about the relations between objects within the logical model.

The specialist model also accommodates the previous definition of symbol, in which objects from two different domain models are 'thrown together' through an association. Beal argues that when two specialists agree upon a signal, they may each interpret it differently, so that the signal captures a relationship between two concepts. In otherwords, the signal encodes the relation between concepts in two different specialist domains. Traditionally, the signals input into such specialist models consist of sensory information such as visual and audio cues. Specialists process each of these signals and through communicating their outputs to each other, decide upon an encoding relating the processed version of the original phenomenon. In otherwords, these specialists dynamically develop a mapping between two separate interpretations which is defined in terms of a common signal.

Intuitively, if the brain passes such signals between specialists, relating concepts and thereby inferring new knowledge to understand sensory information, could more of the same neurological circuitry pass other, higher-level signals between other specialists to infer other types of knowledge. For example, could such circuits pass signals encoding logical information obtained from text and diagram to glean more information about the nature of the subject being discussed? Does it hold that adding more circuits of this type to the brain increases one's ability to process symbolic information? Even more interesting, does serializing these higher-level signals into a language enable specialists in other people's brains to process information and thereby relate to each other? If the brain does pass signals encoding higher-level thoughts between specialists; if the brain processes a common signal in two different ways and thereby generates a mapping between two specialist domain objects, then Beal's specialist model could be a mechanism for symbol-generation at the neurological level. A symbol is an association between a domain object representing meaning to another domain object or sign for representing that meaning.

What could one do with a system that could generate symbolic associations using a specialist model? The first task would be to decide upon what types of symbols one wants to reason upon as the meaning and presentation of those symbols would determine the type of signal and specialist used. Lets say that one wants to reason upon symbols mapping logical astronomical and geometric entities to a presentational space of two dimensional geometric objects. The signal used would have to encode the logical and geometric concepts and serve as input for a specialist that could generate an appropriate two dimensional geometric representation of these concepts. Rather than encoding the signal for a logical entity in a format useful for neural networks, to start, this signal could be encoded in a string format that could be passed across another network, the internet. The string representation of the logical meaning of the symbol would then serve as the signal to a specialist that would process this signal and produce a two dimensional visualization of the logical concept. To obtain one interpretation of this visualization, the same signal string can be passed to another specialist which outputs the text describing logical entity being visualized. Through encoding the signal in a manner that can be parsed by two specialists, machine-actionable symbol resolution becomes possible.

Towards a Symbolic Reasoning Engine


The Planisphere Reader represents a first step towards a symbolic reasoning engine. The architecture of the application consists of two specialists, a CTS 2.0 implementation for retrieving text, and a suite of diagram services for generating diagrams. Both of these specialists take a CTS-URN encoding a step of the reasoning process represented by text and diagram in Ptolemy's Planisphere. These signals, encoded as URNs, are then sent off to both specialists, generating a diagram and retrieving the corresponding text. The act of retrieving diagram and text using the same signal associates logical information with presentational information, thereby creating a symbol, two objects from different domains that are 'thrown together' by an association. When reading the text associated with a given geometric presentation, the reader then relies upon specialists for vision, and language to resolve the sensory input signals corresponding to the page. In other words, whether specialists operate within one's brain, the brain of a friend, or a computer, one can utilize all of them by deciding upon a communication protocol, be it neurological circuits, natural language, or HTTP protocols. Central to this process is determining how best to encode information so that it can be used by all. Much can be learned from the history of textual transmission, which details the technologies developed to encode such information and translate it into forms useful for specialists using different encodings (languages) of the same underlying signal (concept).

One extremely useful logical structure that has been used throughout this paper is the reasoning process. Through the reasoning process, prior knowledge is used to infer new knowledge until a conclusion has been reached. What better way to explore the structure of the reasoning process than to look at how the Greek's, considered by many to be the origin of rational thought, encoded their mathematical proofs. These Euclidean-style proofs actually have a formal structure as discovered by Proclus. Each proof has an 'enunciation' in which the general problem is described, a 'setting-out' in which the logical elements of the problem are labeled, a 'construction' in which the machinery necessary for the proof is developed, a 'proof' which "draws the required inference by reasoning scientifically from acknowleged facts", and a 'conclusion' in which that which is to be shown has been demonstrated. Currently, the Planisphere reader encodes the construction process which may be thought of as an ordered sequence of 'steps'. Each step consists of prior knowledge which has been given, and an action which is to be performed, in this case drawing upon the piece of paper. Each step is justified by a property of the knowledge domain, in this case geometry. It may be possible to generalize the logical models for construction and proof to reasoning processes beyond Euclidean geometry. In general the rational thought process consists of a sequence of reasoning elements which are combined and semantically reinterpreted to infer new knowledge. In the construction, these reasoning elements are steps, in the proof, they are assertions.

The Planisphere reader allows for the navigation and production of geometric diagrams. Unlike traditional diagrams however, the meaning underlying a particular geometric shape drawn can be resolved by man or machine. Rather than requiring a human to associate a diagram with its text, such an association, the symbolism of the diagram, is explicitly encoded in a machine-actionable format. The explicit encoding of semantic information is necessary for representing the construction process used to generate the image of the diagram. As a construction progresses, new logical meaning is gained from manipulating the presentational symbols on the page. In order for the reasoning to progres, this new meaning must be associated with the appropriate presentational primitives, the meaning of the symbol must be changed. For example, the first construction of the Planisphere logically references the presentational point e as a north pole, and the presentational circle abgd as a sphere, but later these primitives are interpreted as a Euclidean point and circle respectively. A further requirement of the reasoning process used in Euclidean proof: logical meaning must be assigned to emergent primitives, any primitive which results from manipulating one or more previously defined primitives and whose coordinates are completely determined by those primitives. Needless to say, encoding such a reasoning process depends upon the ability to encode and process the changing meanings of diagrammatic symbols.

Labels: , , ,

Wednesday, March 14, 2007

Adding Value to Open Scholarly Content

One of the ways in which the Perseus Digital Library increases accessibility to and interest in the humanities is through making its content freely available. Perseus can give away its content and still keep its users coming back because it provides semantically precise associations between content. The Canonical Text Services (CTS) protocol and the CTS-URN syntax it defines exemplify the types of services Perseus uses to intra and inter-connect content. However text services are just one aspect of Perseus' service layer, a layer which represents just one-quarter of Perseus' overall logical architecture. Each of the layers of Perseus' logical architecture increase the value of the humanities and thereby the value of Perseus in the eyes of its users.

To understand how Perseus can give away its texts without losing users, it is helpful to distinguish between Perseus' static and dynamic content. Perseus' TEI-XML texts, artifact and image metadata, and named-entity and morphological datasets are static data that Perseus currently or will distribute freely under the Creative Commons license. Perseus can afford to do this because its value lies in the associations it can create by making its data dynamically accessible. These semantically precise associations within Perseus' own content (intra-connecting) and between its own content and external data and/or services (inter-connecting) give users a wealth of context for understanding and interpreting Classical data.

The CTS protocol will make Perseus texts dynamically accessible, allowing them to be connected in ways that downloading a bunch of TEI-XML texts simply does not provide. The Canonical Text Services protocol uses URNs to reference texts in terms of their hierarchical structure. A cousin of the Fundamental Records For Bibliographic Records (FRBR), CTS URNs may identify the work of an author or an edition or translation of a work, but extend FRBR by referencing texts by their logical citation scheme. Rather than navigating by page number, CTS interprets the semantics of these URNs to retrieve logical sections of a text organized by chapter, section, or some other scheme. Furthermore, the protocol specifies extensions to the CTS URN for referencing an arbitrary sequence of characters within a passage, providing a syntax for textual alignment across editions without losing context.

CTS URNs enable Perseus to create associations which increase the value of its data. Just as Google Page Rank takes HTML links into consideration to measure the value of a page to a user, Perseus can increase the value of its content to its users with far greater semantic clarity and precision by using CTS URNs. Some services that will eventually be exposed via index services include Perseus' named-entity disambiguation, citations, and morphological information. For each of these examples, CTS-URNs and collection IDs will be combined with Index Services to make connections within Perseus' own data. However, URNs also allow Perseus to connect its highly-structured texts with external data and services; one such service is searching using Google Base.

Connecting highly-structured data with less structured data through search is not unfamiliar. When searching in Google Earth, the display window represents a range of geographic coordinates currently visible to the user. When a user performs a search, hits are displayed in terms of their geographic coordinates as markers. A CTS-aware search would be similar, results are limited to the range of textual coordinates associated with the current passage and displayed in terms of their textual coordinates. Just as Google Earth parses KML documents and provides services consistent with the seamntics of longitude and latitude, so could a CTS-enabled search parse CTS URNs, interpreting the semantics of this coordinate system for textual reference. Currently, Perseus is experimenting with using CTS-URNs within queries in Google Base. After creating an item for each logical section of text and specifying the corresponding URN in the page's metadata, this less-structured data is uploaded to Google Base. Each of these items links to Perseus' text page, which provides rich context for the item. Using this approach, one can search for a text from the author to passage level (it would be very difficult to generate an item for each character on a page) and get relevant hits that point directly to Perseus. Furthermore, a URN specified in combination with a term, such as 'horse' provides a mechanism for limiting search results to the text(s) represented by that URN. (Experimental Examples: Caesar's The Gallic War, a Perseus edition English translation of Caesar's The Gallic War, Book 1, Chapter 1 of the aforementioned edition, occurrences of 'horse' within the aforementioned edition)

CTS provides numerous benefits not inherent in the raw data, but which are emergent through the behaviors it defines. The CTS protocol defines a standard mechanism for referencing and retrieving texts. Since it is an open protocol, with a functioning implementation exposed via an API, the semantics and behaviors of CTS are specified independently of implementation, and are possible to implement. The specification allows one to quantitatively measure how effective a given implementation is by unit testing the its conditions. The implementation provides a well-defined API that encodes the domain knowledge resulting from working with and a desire to reference texts independently of representation whether a manuscript, book, pdf, or web page. Since all requests to CTS require a CTS-URN, tracking users in a semantically-meaningful way becomes possible as the URN will appear in HTTP access logs. Questions such as how users are navigating the text, what services are being invoked on a given reference, and what requests were performed on that reference can all be answered in terms of the underlying logical structure of the text. Furthermore, since CTS was designed to work with Ancient Greek and Latin texts, it can handle multi-lingual content, and provides a syntax for datasets of aligned texts. Logical referencing of text independent of physical representation, a specification with clear meaning, an API whose functionality can be quantitatively verified, and a notation for semantically precise associations illustrate how dynamic services add value to static content and keep users of Perseus coming back.

Although text services are central to Perseus' mission, they are just one of the services Perseus offers and the entire service layer only accounts for one-quarter of Perseus' logical architecture. Perseus' logical architecture can be used to classify Perseus' other sources of value that both increases the value of the humanities and the value of Perseus in the eyes of its users. First, Perseus' data layer, comprised of TEI-XML texts, databases, and other raw data is freely distributed under the Creative Commons license. Not only does this establish Perseus as a data source to the community, but distributing multiple copies of each text increases its chances of surviving into the future. Rather than having to copy a text by hand, digitization provides the humanities with the ability to make and distribute an arbitrary number of copies, increasing the accessibility and survivability of each text. Furthermore, Perseus' expertise in digitizing these texts serves as a source of value for those who wish to create their own digital editions. Second, the domain layer, where behaviors are associated with the raw data encodes the knowledge and experience gained while working with the content. Working in the domain of Classical texts provides Perseus with a unique perspective on the nature of text that others may find useful, and so gives Perseus the opportunity to help others make sense of their content. Third, Perseus' service layer, provides a series of APIs implementing a set of protocols for each of the types of data Perseus serves. Many of these services rely upon the protocols specified in the TICI stack. Through the service layer, others are free to repurpose Perseus' content through an API that encodes domain knowledge and since the API freely available under an open source license, the community using the API becomes a source of information and value as well. Finally, the display layer, whether widgets, HTML pages, or PDFs gives all users a convenient and easy way to access Perseus' data. The user interface reflects the knowledge gained when building the other layers, and so helps the general public visually see the relations between Perseus' data.

Perseus can give away its static data because it adds value through providing semantically rich associations, adding context to its content. The CTS protocol offers a new way to conceive of, reference, and deliver texts and CTS-URNs provide a syntax for specifying relations among Perseus' and external content and services. These relations increase the value of Perseus in a way that is not inherent in the raw data, but comes from creating associations among the data. In giving away its raw data, Perseus encourages others to develop their own associations, increasing its value as a data provider and as service developers while increasing access to and therefore innovation within the humanities.

Credits:
Thanks to John Blossom whose "Shoreviews. Content Industry Outlook 2007: Reality Checks" gave me criteria for evaluating CTS as a value-added service in the context of the publishing community. Thanks to Gregory Crane for his ideas on interconnecting primary and secondary sources within the Perseus Digital Library and his initial recommendation to look at Google Base. Thanks to Neel Smith for the comparison between searching within Google Earth and a CTS-aware search. Thanks to my brother, Michael Weaver, for his work on the logical layers of an application and their relation to business processes.

Notes:
Based upon slides presented on the panel Getting Search Right for Premium Content during the Spring 2007 ASIDIC meeting.

Labels: , , ,