jeudi 29 mai 2008

Is the telegraph of Hawthorne the Internet of today?

"Then there is electricity the demon, the angel, the mighty physical power, the all-pervading intelligence!" exclaimed Clifford. "Is that a humbug, too? Is it a fact or have I dreamed it that, by means of electricity, the world of matter has become a great nerve, vibrating thousands of miles in a breathless point of time? Rather, the round globe is a vast head, a brain, instinct with intelligence! Or, shall we say, it is itself a thought, nothing but thought, and no longer the substance which we deemed it!"

"If you mean the telegraph," said the old gentle man, glancing-his eye toward its wire, alongside the rail-track, "it is an excellent thing; that is, of course, if the speculators in cotton and politics don’t get possession of it. A great thing, indeed, sir; particularly as regards the detection of bank-robbers and murderers."

"I don’t quite like it, in that point of view," replied Clifford. "A bank-robber, and what you call a murderer, likewise, has his rights, which men of enlightened humanity and conscience should regard in so much the more liberal spirit, because the bulk of society is prone to controvert their existence. An almost spiritual medium, like the electric telegraph, should be consecrated to high, deep, joyful, and holy missions."

Hawthorne, Nathaniel, The House of the seven gables; (pref. 1851); The Flight of two Owls p317- p318, London, Collins' Clear-Type Press, 1851, see online version

jeudi 15 mai 2008

Surrogates and the Semantic Web

The semantic Web can be summarized as a Web augmented with formalized knowledge annotating it and that applications can use through inferences for their tasks or to help users navigate, search, modify, etc. To interact with this semantic Web we need interfaces that make it intelligible to end-users. The problem of intelligibility is different from the one of interoperability. Intelligibility is not ensured by working at the semantic level. Pieces of knowledge are manipulated and combined, and the intelligibility of the results is not ensured through transformations.

Indexing resources consists of scanning the set of these resources and building representative surrogates (e.g. a vector of terms) for each one of them. Surrogates for indexing may be as simple as collecting some words of the document or as complex as a natural language analysis of its content resulting in semantic annotations. Surrogates are not limited to representing the content of the resources, but can also include metadata (e.g. editor, author, ISBN, etc.) and more generally external properties of the resource (e.g. number of hyperlinks pointing toward it, etc.). The choice of the surrogate influences to a great extent the whole information systems and motivated a lot of research in the field of information retrieval. Thus, in information systems, the first use of surrogates of information resources is to provide a highly synthetic and representative structure reduced to those features of the resource that are relevant for the intended processing.

Besides the efficiency of the retrieval algorithm, a major factor in users' acceptance of a system is the user-interface through which they interact with it. Search results in a classical Web search engine usually take the form of a list of pages with for each result some text extracted from the selected page and justifying the selection. In addition, other information may be given (e.g. URL, date of indexing, thumbnail) and the results are ranked according to their estimated relevance to the query. To represent this set of selected resources, the system uses a second kind of surrogates: the second use of information resource surrogates is to provide a highly synthetic and representative structure reduced to those features relevant for the users to identify the resources and their position in the set of results.

An identifier is always present in both types of surrogates, but it is not enough for users since it is usually a system identifier such as a database primary key or a URI (e.g. which barely provides any information about the resource and is usually only used in operations such as joints. Thus the second kind of surrogates requires information such as: title, focused extract, previews, etc. From the choice of the surrogate depends the ability of the system to propose views that organize the results efficiently.
In the case of a semantic search engine (a search engine exploiting formal knowledge representation) where queries and results are only limited by the available ontologies, the problem of finding a generic mechanism to build these surrogates is an open one. One of the difficulties is that the relevance of the features used to build a surrogate is domain-dependent.
Currently, there is a huge gap between the conceptual structures underlying the semantic web and the final rendering of a user-interface enabling an end-user to peruse or act on part of it. Most of the time user-interface designers implement in ad hoc ways the transformation from their internal data structures to the interface representations. This is no longer feasible when the data structures, their schemata, transformations, etc. are changing and propagating through networks. In other words this is not possible in the semantic Web. Interfaces will have to be, at least partly, dynamically generated and rendered for every structure coming in contact with the users. We need to automate part of the process of generating representations for the concepts mobilized in the semantic web.

Ontologies provide the semantic grounding for communication and as such, are at the frontier between the conceptualization of the system and the one of the users. Thus ontologies need to be understandable both to humans and to machines, otherwise they can no longer play this pivotal role and they are no longer usable, maintainable, etc. Moreover, the whole internal conceptual structures should never be shown to end-users; not only because their logical face is abstruse, but also because, as humans, we do not mobilize our whole conceptualizations each time we communicate, think, act, etc.: we focus. Therefore, a system must not impose to users to handle a whole ontology each time they have an interaction: we focus and user-interfaces must focus with us.

User-interfaces have the unenviable role of bridging the gap between explicit conceptualizations captured in ontologies and day-to-day use of signs to denote concepts with unavoidable ambiguity and fuzziness. The very simple fact of choosing labels in an ontology introduces it in the field of semiotics; user-interface and ontological problems must be tackled in parallel. This brings us back to the problem of choosing surrogates for visualization. This aspect was mostly overlooked in literature while it is vital to support the mechanisms of interpretation associated to our models, our inferences and their results. For example, to represent the instance of a person, it makes sense to build a surrogate including the first name and the surname of the person but the age, height and address may not be useful unless explicitly required by a scenario. Distinguishing between key and non key properties is a scenario-dependent task.

To go even further, there is a need for a richer model of the links between the conceptual structures of the semantic web and the semiotic level of the classic web for humans. This is a call for the involvement of the semioticians in the modeling and inference mechanisms underlying the semantic web. This is becoming vital in a pervasive World Wide Web using multi-modal, multi-media devices and growing more and more mobile every day. We need to be able to identify and differentiate between alternative signs and media channel, and link these semiotic alternatives and their logics to the underlying semiotic structures and logics. The question of the link between the ontologies and the semiotic systems they interact with is to be explored thoroughly. Looking even further there is a need to model the combination of semantic web resources and pragmatic web resources to produce semiotic web resources.

The generation of the interfaces for the semantic web will be dynamic and will use: the users' profile, the context and history of interactions, semiotic modeling primitives added to our meta-model, signs linked to the primitives of our ontologies, logics of semiotics and surrogate generation, in addition to the conceptual structures to be communicated to the users.

Exracted from "Fabien Gandon, Generating Surrogates to Make the Semantic Web Intelligible to End-Users, IEEE/WIC/ACM International Conference on Web Intelligence, Compiegne University of Technology, September 19-22, 2005"