LogicWeb: Using Logic Programming to Extend the World Wide Web

Seng Wai Loke
Department of Computer Science,
The University of Melbourne,
Parkville, Victoria 3052, Australia
Email: swloke@cs.mu.oz.au

Abstract

LogicWeb is an integration of logic programming ideas and the World Wide Web. It extends Web pages with knowledge-based behaviour and offers logic programs an interface to the Web. A logic program is incorporated within a page allowing programmable behaviour and state to be tightly associated with the page. This allows the page to be queried using its knowledge base represented as a logic program.

The program incorporated into a page can reason with other pages as part of its behaviour. These other Web pages are treated as logic programming modules, termed LogicWeb modules. Special operators are available for the retrieval of LogicWeb modules and to invoke goals within them, as well as to combine modules.

The use of logic programming has a number of advantages, including the ease of specifying searching, the availability of knowledge-based reasoning, and the ability to define semantics for all the extensions. Logic programming also permits the meta-level manipulation of modules.

We outline three LogicWeb application areas: rule-based search tools, lightweight deductive databases, and distributed software engineering.

Keywords: logic programming, rule-based reasoning, World Wide Web

1 Introduction

Logic programming is based on mathematical logic, where computation is treated as deduction from a set of axioms or rules. It has a number of favourable properties including: (1) a declarative view of programs, allowing more focus on the problem and explicit statement of knowledge and goals, compared to imperative languages; (2) a uniform means of representing data and computations; and (3) a solid semantic basis. Implementations of logic programming, such as Prolog, have special features including automatic memory management, pattern matching via unification, automatic backtracking, structured database representation and manipulation, grammars for parsing and meta-programming capabilities.

These features make logic programming ideal for applications requiring search over possible solutions, symbol manipulation, and flexible manipulation of databases (Lazarev 1989). For example, it is used in AI problem-solving, knowledge representation and expert systems (Prolog, Lazarev 1989). It is also used in the field of deductive databases where rules provide greater modelling capabilities, extending relational databases. It is also favourable from a software engineering viewpoint: it allows executable specifications to be written (Lazarev 1989). Recent work on modular extensions to logic programming (Bugliesi et al.1994, Brogi 1993) provide mechanisms for structuring larger Prolog software.

The World Wide Web is growing rapidly, particularly as an information dissemination tool. There is already extensive work on using the Web to transport, not only HTML (Hypertext Markup Language) text, but program code (termed 'mobile code'), and work on using the Web infrastructure for distributed applications (The World Wide Web Consortium). For example, the Java object-oriented language JAVA is enormously popular for enhancing the Web with interactive applications. More recently, the integration of logic programming and Web technology is being explored more extensively in the form of Prolog libraries to access Web pages, expert systems on the Web, mobile Prolog code, and MOOs (Multi User Domains - Object Oriented) (Davison 1996).

In this paper, we explore the use of logic programming in Web-based applications in the context of current work on LogicWeb. In particular, we shall look at the following application areas: declarative formulation of information searches, deductive databases on the Web, and distributed software engineering using the Web as the communication medium.

LogicWeb (Loke & Davison 1996) is an application of logic programming ideas to the Web, which treats Web pages as logic programming modules. The Web page becomes a live information entity that uses its rules to respond to user queries. Also, LogicWeb modules are treated as first-class objects within a logic program, enabling the rules to reason with other pages and define relationships between pages. Special operators are available for the retrieval of LogicWeb modules, to invoke goals within them, and to combine them.

The use of logic programming has a number of advantages, including the ease of specifying searching, the availability of knowledge based reasoning, and the ability to define semantics for all the extensions. Logic programming also permits the meta-level manipulation of modules.

In the following sections, we describe LogicWeb and outline the above mentioned applications. We then discuss LogicWeb as a mobile code system, and conclude with a brief description of the current implementation, and directions for future work. We shall assume an acquaintance with Prolog.

2 LogicWeb

LogicWeb establishes a correspondence between Web pages and logic programs consisting of facts and rules, which we call LogicWeb modules. A Web page with the URL (Uniform Resource Locator) <URL> corresponds to a module whose name is the term m_id(<URL>). The module contains at least the following facts, my_id(<URL>) and h_text(<HTML source>), the first of which stores the URL of the page and the second, the HTML source of the page. This is sufficient to model ordinary Web pages. Ordinary Web pages can be parsed to extract facts, such as a collection of links used in the page. For example, the following link in HTML
 
<A HREF="http://www.cs.mu.oz.au/~swloke/logicweb.html">LogicWeb</A>
is represented as the following fact:
link("LogicWeb","http://www.cs.mu.oz.au/~swloke/logicweb.html").
This provides a layer of abstraction beyond the text of a page.

A LogicWeb module can contain a logic program written in Prolog extended with new operators. There are currently four operators [Loke et al. 1996a], two of which are:

3 Applications

In this section, we outline three broad application areas demonstrating the utility of the LogicWeb features mentioned above.

3.1 Search Tools

The LogicWeb operators can be used to specify a search for specific information on the Web, which is viewed as a directed graph. For example, the following rule specifies a search in a page for URLs pointing to HTML documents on a topic defined by Keyword. Each link of the page is tested to see if its URL has the '.html' extension, and its label, a keyword related to Keyword. If it does, the page is retrieved and examined for Keyword.
 
relevant_link(Keyword,URL,RelevantURL) :-
	m_id(URL)#>link(Label,RelevantURL), 
	contains(RelevantURL,".html"),
	related(Keyword,Keyword1), 
	contains(Label,Keyword1),
	m_id(RelevantURL)#>h_text(Source), 
	contains(Source,Keyword).  
The goal m_id(URL)#>link(Label,RelevantURL) retrieves link information from the page of URL URL from link/2 facts. Subsequent links in the page are retrieved on backtracking. contains/2 determine if its second argument is a substring of its first. related/2 are facts relating keywords. A goal such as
relevant_link("logic", "http://www.cs.mu.oz.au/~swloke/resources.html",
RelevantURL).  
retrieves links concerning 'logic' from the resource page.

This simple example illustrates how users can write rules to automatically search Web sites based on heuristic knowledge.

Based on the above ideas, we have implemented a rule-based tool (Loke et al. 1996b) that searches several Web sites for a given citation, given an author's name and title keywords. Search engines, such as Lycos, are used to provide starting points for searching.

Logic programming is ideal for specifying different kinds of graph searches and for encoding heuristic knowledge to guide the search. It is also useful for parsing and for rule-based analysis of documents. A simple example is a rule that states that a document is relevant if it contains a set of related keywords.

In a recently proposed logic-based Web query language called WebLog (Lakshmanan et al. 1996), rules refer to pages (and its components) using URLs, treated as first-class objects. It allows formulation of queries over pages like the above example. However, their work is based on a database query language and they do not view pages as modules. Our view of pages as modules allows us to use module composition as an abstraction for gleaning information from different sources.

3.2 Lightweight Deductive Databases

Incorporating logic programs in Web pages enhances the Web with structured information which is more susceptible to sophisticated automated searching, extraction and processing. These logic programs can be treated as deductive databases consisting of relations defined by the rules and facts. These databases which are distributed on the Web have the benefit of widespread accessibility, distributed maintenance and incremental development. Since we do not aim for full transaction processing capabilities, we termed these as lightweight deductive databases (Loke et al. 1996a), based on the terminology used in Dobson & Burrill 1995.

LogicWeb provides operators to dynamically combine modules containing databases in queries. For example, assume that three institutions have their own databases of academics and their research interests (represented as facts in LogicWeb modules) but all conforming to the following schema:

interested(staff_name,list_of_research_topics).

Suppose we would like to find out which academics from the institutions are interested in a given topic (for example, databases). The query is expressed in the form of a goal evaluated against the union of the modules from the institutions, as shown in the following rule:

 
academic_interest(DatabaseModules,Topic,Name) :-
	lw_union(DatabaseModules)#>interested(Name,ResearchTopics),
	member(Topic,ResearchTopics).  
This rule is invoked with the goal:
academic_interest([m_id("http://institution1/research_db1.html"),
	m_id("http://institution2/research_db2.html"),
	m_id("http://institution3/research_db3.html")], 
	databases,Name).  

The locations of the modules containing the databases can themselves be obtained from databases; that is, we can built databases about databases. We can also specify relationships between databases or build a semantic net whose nodes are a collection of related databases. Such a semantic net provides an alternative structure for the Web. Besides the databases, knowledge bases can be built to process user queries.

Lightweight deductive databases allow users to express information and how they can be manipulated in a more structured and formal way compared to ordinary HTML text. The LogicWeb framework also allows interfaces to be built over these databases. For instance, a LogicWeb module can have clauses that process user queries by retrieving the modules containing the databases, performing the matching, and displaying the results.

Search rules can be used to search the Web for the modules with the databases relevant to a given query. Hence, LogicWeb provides a single language for specifying both the search for the modules and the retrieval of the information from within modules.

3.3 Distributed Software Engineering

In a distributed software engineering environment, timely communication between project members is imperative. The Web is a suitable platform to facilitate this (Baentsch et al. 1995) since up-to-date information is easily made available via the Web.

Logic programming has advantages for software engineering. Modular abstractions are needed for programming-in-the-large (Bugliesi et al. 1994, Brogi 1993). LogicWeb provides this and also allows a module to retrieve (and use) on the fly other modules during goal evaluation.

Sedlock et al. recommends writing Prolog programs in the literate programming style, together with their documentation and specifications, in HTML files. The programs are extracted when needed. This not only benefits the programmer, but also other team members who can then browse the code. He describes an implementation in Prolog of a software project management system where all the code is written in HTML files. However, all the source code resides on a single server. We take his recommendation further and suggest LogicWeb as a distributed programming platform.

4 LogicWeb and Mobile Code

Mobile code refers to programs transferred over the network between hosts. Languages such as Java and Safe-Tcl are being used as mobile code languages where programs are transferred from the server to the client. This reduces server load by moving computations to the client, enables convenient software distribution, and enhances the Web with dynamic behaviour.

Mobile code languages must deal with the issues of security (since foreign code is executed locally), architecture-independence and resource control. Hence, a main feature of these languages is that they are interpreted. This enables control over the instructions executed. LogicWeb modules are executed locally and hence, must not allow damaging code to run. Fortunately, interpreters for logic programs can be easily written and as logic programs themselves. A meta-interpreter has control over every goal that is executed and hence, can guarantee safety. Automatic memory management in Prolog programs also means that users do not deal with pointers which could potentially access arbitrary memory locations.

5 Summary and Further Work

We have implemented a prototype system to test the above ideas (Loke & Davison 1996). The system utilises the Common Client Interface to enable Mosaic to communicate with a Prolog engine. Downloaded programs are interpreted. We are investigating an implementation of LogicWeb as a Netscape plug-in.

We have outlined how LogicWeb can be used in three broad application areas. LogicWeb enables information providers to set up downloadable programs which help search for information on their repositories. These programs become interfaces to their sites and can take the form of browsing aids, query-able knowledge bases, or information on related sites. We intend to build specific applications utilising the above techniques and to explore new application areas.

The Web is a dynamic information resource. We would like to examine operators with a temporal notion such as latest(ModuleId) and refresh(ModuleId,TimeInterval) which can be used to retrieve the latest version of a module and to periodically update a module, respectively.

Acknowledgements

I am indebted to my supervisors, Dr. Andrew Davison and Prof. Leon Sterling, for valuable comments and suggestions that have improved the paper and for their ideas that have significantly influenced this work. LogicWeb began as Andrew's conceptualisation.

Bibliography

1
Sterling, L. & Shapiro, E. (1994) The Art of Prolog, 2nd edn, MIT Press.

2
Lazarev, G. (1989) Why Prolog? Justifying Logic Programming for Practical Applications, Prentice-Hall.

3
Bugliesi, M., Lamma, E. & Mello, P. (1994) Modularity in Logic Programming. Journal of Logic Programming, pp. 443-502.

4
Brogi, A. (1993) Program Construction in Computational Logic, PhD thesis, Universita di Pisa-Genova-Udine, Pisa, Italy.

5
Sally Khudairi (nd) The World Wide Web Consortium (W3C)
URL: http://www.w3.org/

6
Sun Microsystems, Inc. (nd) Java(tm) - Programming for the Internet, Sun Microsystems, Inc.
URL: http://java.sun.com/

7
Andrew Davison (1996) Proceedings of the 1st Workshop on Logic Programming Tools for INTERNET Applications (in conjunction with Joint International Conference and Symposium on Logic Programming '96)
URL: http://www.cs.mu.oz.au/~ad/lp-internet/archive.html

8
Loke, S.W. & Davison, A. (1996) Logic Programming with the World Wide Web, Proceedings of the 7th ACM Conference on Hypertext, ACM Press, pp. 235-245.
URL: http://www.cs.unc.edu/~barman/HT96/P14/lpwww.html

9
Loke, S.W., Davison, A. & Sterling, L. (1996) Lightweight Deductive Databases on the World Wide Web in Proceedings of the 1st Workshop on Logic Programming Tools for INTERNET Applications ( in conjunction with Joint International Conference and Symposium on Logic Programming '96), eds P. Tarau, A. Davison, K. De Bosschere & M. Hermenegildo, Bonn, Germany.
URL: http://www.cs.mu.oz.au/~ad/lp-internet/lwddbs/lwddbs.html

10
Loke, S.W., Davison, A. & Sterling, L. (1996) CiFi: An Intelligent Agent for Citation Finding on the World Wide Web, Proceedings of the Fourth Pacific Rim International Conference on Artificial Intelligence, Carins, Australia, pp. 580-591.
Loke, S. W., Davison, A. & Sterling, L. (1996) CiFi: An Intelligent Agent for Citation Finding on the World Wide Web, Lecture Notes in Artificial Intelligence (LNAI 1114), eds N. Foo & R. Goebel, Springer-Verlag.

11
Lakshmanan L. V. S., Sadri, F. & Subramaniam, I. N. (1996) A Declarative Language for Querying and Restructuring the Web, Post-ICDE IEEE Workshop on Research Issues in Data Engineering (RIDE-NDS '96), New Orleans, USA.
URL: ftp://ftp.cs.concordia.ca/pub/laks/papers/ride96.ps.gz

12
Dobson, S. A. & Burrill, V. A. (1995) Lightweight Deductive Databases, Proceedings of the Third International World Wide Web Conference, Darmstadt, Germany.
URL: http://www.igd.fhg.de/www/www95/proceedings/papers/54/darm.html
Dobson, S. A. & Burrill, V. A. (1995) Lightweight Deductive Databases, Computer Networks and ISDN Systems, Elsevier, 27(6).

13
Baentsch, M., Molter, G. & Sturm, P. (1995) WebMake: Integrating distributed software development in a structure-enhanced Web, Proceedings of the Third International World Wide Web Conference, Darmstadt, Germany.
URL: http://www.igd.fhg.de/www/www95/proceedings/papers/51/WebMake/WebMake.html
Baentsch, M., Molter, G. & Sturm, P. (1995) Webmake: Integrating distributed software development in a structure-enhanced Web, Computer Networks and ISDN Systems, Elsevier, 27(6).

14
Sedlock, D. & Jörg, J. (1996) Managing Software Projects with Prolog and the WWW, Proceedings of the Fourth International Conference on the Practical Applications of PROLOG, London, UK.
URL: http://www.franken.de/users/nicklas/das/papers/pap96/pap96.html

15
Dan Connolly (nd) W3C: On Mobile Code, World Wide Web Consortium
URL: http://www.w3.org/pub/WWW/MobileCode/


organised by: 
AUUG'96 & CSU Return to Conference Proceedings