Local time: Monday, 24-Nov-2008 03:47:11 EST
Last update: at /special/conference/apwww95 , Friday, 21-May-2004 09:47:19 EST
![]()
An Interactive Relational Database Gateway with Load Balancing
Michael BJÖRNUniversity of Tsukuba
Doctoral Program of Socio-Economic Planning
1-1-1 Tennodai, Tsukuba, Ibaraki 305, JAPAN
Phone +81 298 53 5424, Fax +81 298 53 5070
E-mail: Finally, WDB by Rasmussen [Ras94] is a relational gateway with an approach similar to GSQL. It uses form definition files ("fdf") instead of GSQL's proc files, but the end effect is the same: Using external files for form definition is naturally much better than hard-coding names of tables and rows, etc., directly into the gateway. But defining forms in external files still means duplication of information about the database schema, which can be retrieved directly from the database. The problem of static and dynamic HTML is discussed further in section 2.Rasmussen also explains the differences between a typical database connection, (where the user logs in, performs arbitrarily many transactions and then logs out) and a typical connection over the WWW, where the user only can perform one single transaction per connection. We discuss this in-depth in sections 3 and 4.
2. Dynamic Generation of HTML and SQL
It lies in the nature of CGI gateway programs to be able to generate HTML code dynamically. It can be anything from simply generating redirections when users request badly formed URLs to providing complicated page layouts. However, among the relational gateways in use today HTML code generation is mostly a one-way street: All relational gateways support dynamic HTML generation for display of results. This is a necessity, since it is impossible to judge the size of returned data beforehand. But very few relational gateways attempt to dynamically generate HTML and the corresponding SQL code when preparing transactions (query, insert, delete and update) with the database system. This can lead to situations where a lot of coding effort goes into building gateways which make many assumptions about the structure of the database to be used. In the worst case, the gateway has to be totally rebuilt if the structure of the database changes, and in other cases the (static) HTML-files have to be rewritten, often using specialized naming conventions.
We instead take a fully dynamic approach where the only HTML file physically present on the server is the login.html file, where the user specifies name and password for the database system, as well as e-mail address. For this reason, we need make no assumptions whatsoever about the structure of the databases that are accessed.
A schematic overview of how a totally dynamic relational gateway works looks as follows:
Figure 2.1. Static and dynamic HTML
One system which does take the same approach is OraForm [Ole95]. OraForm queries the database system for all tables and column names of a default database, and lets users dynamically perform multi-table SELECTs in the database. OraForm however neither lets the user switch between databases, nor does it allow for more advanced features, such as repeated queries, inserts, updates or deletes of tables or data.
3. Single Transaction Pair - State Information Not Necessary
The transaction protocols in HTTP [Ber91] and in SQL are quite different. The main purpose of the relational gateway program is to provide SQL sessions using HTTP as the transport protocol.
The HTTP protocol consists of 4 steps:
This protocol implements a transaction consisting of a single request-response pair in a session with a WWW-server. Most relational gateway implementations that we have seen so far use this single transaction for database access as well. A session using such a protocol has the following structure:
- Establishment of a connection by the client to the server.
- A client request to the server.
- A response from the server to the client.
- The closing of the connection by client or server.
Figure 3.1. Schematic view of a relational gateway following the HTTP protocol.
3.1 Single Transaction Database Sessions
Let us take a look at a simple example where a user creates a table called suppliers in the database company, and then populates the table with data. Although the following transactions make sense in the order they are presented, they are not connected - each transaction finishes execution within HTTP's single request-response pair (assuming that database username and password is sent with each transaction).
![]()
![]()
Figure 3.2. The client chooses a database (company) and an action (Insert a table). Figure 3.3. The client fills out a table definition form. In figures 3.2 and 3.3 the first HTTP transaction is used for providing data dictionary information, and the second transaction is used for inserting the user specified table. When the user clicks Insert Table, a result message (not shown) is returned.
Next, we might want to populate the table with some data. The user selects Insert values in figure 3.2, and presses the Prepare Transaction button again, whereupon he/she is presented with the following screens:
![]()
![]()
Fig. 3.4. The client chooses a table (suppliers) to insert values into. Fig. 3.5. The client inserts data values (5090, Acme Suppliers Co., Littletown). Again, the first HTTP transaction is used for providing data dictionary information. The second transaction is used for adding the data that the user has entered in figure 3.5 to the database. When Insert Values is clicked, a result message (not shown) is returned.
Finally, we want to query the table that we have just constructed. The user selects Query in figure 3.2, and clicks the Prepare Transaction button yet again, whereupon he/she is presented with the screen in figure 3.6.
As in the two previous cases, the first HTTP transaction is used for providing data dictionary information so that the user can build the query from existing tables and columns by choosing them in pop-up menus or selecting in scrolling list panes. The user also selects match conditions and matching values in the same screen. The second transaction executes the query and presents the user with the results:
![]()
![]()
Figure 3.6. The client makes a query and joins the suppliers and customers tables. Figure 3.7. The query results are returned. As can be seen, we can do fairly much within the confines of a single transaction. However, no interaction with the user is allowed. Let's say that we would like to use the result of the above query to further query the database. This is exactly what we will discuss in the next section.
4. Multiple Transaction Pairs - State Information Necessary
One of the advantages of the HTTP protocol is that it is comparatively fast. There are, however, disadvantages to this simplicity as well: HTTP is stateless, as opposed to for instance the FTP protocol. This means that a user always logs in "from scratch" and is only allowed one command before he/she is logged out again. The statelessness of HTTP is hidden insofar as connecting usually means retrieving a new multimedia page, and thus the user gets the illusion that state is preserved - the next time he/she connects it will be based upon the retrieval of a previous page, and the user can thus browse material as if the servers were actually preserving state. This setting was described in section 3.
But if the user would like to make a query based upon results of a previous query, those results would have to be uploaded together with the new query from the user's browser back to the database system again. The DBMS would then process the query based on comparisons with data from the first query. No wonder that most database gateways today only allow for simple queries...
However, there is one other way to go, which is surprisingly straightforward: Let the database system itself take care of state preservation for the users.
4.1 Multiple Transactions Database Session: Login
A connection with the database system is maintained by the gateway for each user, so that the user does not need to provide login information for every single HTTP transaction. When the user first logs in to the system, a unique session number is generated. The connection between the server and the browser is broken several times, but each time it is reestablished, the session number allows the relational gateway to verify the user. The gateway each time controls that the session number was resent from the same IP address, so that some security is achieved even when no encryption is used. Neither user name nor password is passed after login, so there is no additional security risk involved.
At login, a private table is created which preserves state information for the user. We call this table a VIEW since it will invariably contain derived data: If the user updates the database (by inserting or deleting tables or data) the global state of the database is changed and thus state is preserved anyway. The private VIEWs are transitional and are automatically deleted when the user logs out from the database or if he/she does not contact the database again within a preset time period (typically 10 minutes).
Since VIEWs contain derived data, their most common use is to query the database based on results from previous results (as shown below) but VIEW data can equally well be used for conditionally inserting data into or deleting data from the database.
4.2 Multiple Transactions Database Session: Repeated Queries
In a database system which preserves user specific state information, we can allow the user to repeatedly query the database using intermediate results or combinations of intermediate results with other parts of the database. This use of repeated queries is an attempt to somewhat capture the power of nested SELECTs.
When preserving personal state information, we incur a performance penalty. The extra work which has to be performed is to save the results of each SELECT statement into a view. However, we actually have to perform this step anyway, to ensure that results do not get lost when time out errors occur. The WWW-server can time out the connection or the WWW-client may not be able to reconnect to complete the transaction; both these problems have been discussed in [Bjo95/1]. In short, by providing necessary time out error handling, we get a mechanism for providing repeated queries at no extra cost.
Each time the user queries the database, a personal view is saved in the database. To use results of a previous query in a new query, we can simply join the VIEW table with other tables of the database using SELECT statements. To illustrate, we continue the example from section 3.
In section 3, we created a new table called suppliers, which has the columns supp_num, supp_name and supp_city. We then inserted a single tuple and made a SELECT. We thus have a personal VIEW, which can be accessed from all menus listing the tables of the current database. We directly continue the example and perform the following query:
![]()
![]()
Figure 4.1. Query involving a join with a VIEW table. Figure 4.2. Results of the query, linking previous results to tables in the database. We select the salesrep from the table orders, and join it with supp_name and cust_name from the VIEW that was generated in section 3, where orders.cust_nr and VIEW.supp_num are equal. When we click Do Query in figure 4.1 the VIEW is updated and the results (in this case that the supplier Acme and the customer Nicolo have been involved in two orders together, with two different sales representatives) are automatically displayed on-screen. We can now in figure 4.2 choose to continue working with the database, or log off, since the relational gateway is still maintaining an open connection with the database for us.
5. Load Balancing
After having implemented the relational gateway as described above, we started testing it and found that it would sometimes weaken the WWW-server's performance unnecessarily much under heavy load situations. We found two reasons for this:
- The WWW-server is kept waiting with an open connection while SQL code is generated, database transactions execute and while HTML pages are generated.
- The WWW-server must devote processing time to maintaining connections as well as performing message and data interchange with the relational gateway.
The number of users who can not establish a connection with the WWW-server naturally increases with load. In the worst case scenario, a particular user might be able to log in to the database and perform some initial transactions, but not be able to get a connection with the server to finish the session, since each new round of interaction with the DBMS requires a new HTTP connection to the WWW-server. The WWW illusion of interactivity then breaks down, leaving frustrated users unable to accomplish their tasks.
To distribute the load on the WWW-server, we implemented a batch mode for return of results. We specify certain criteria for the relational gateway to switch from on-line return of data to batch return by e-mail. When a switch to batch mode occurs, the user is notified and can directly continue the database session (perform other queries, etc.) or log off without having to wait. Three criteria for switching to batch mode were implemented:
(1) User choice
The user might be on a slow connection or impatient to go on with other tasks, and is offered the possibility of manually establishing batch mode. Each time a query has been built and the user is ready to send it off to the server, he/she is presented with a choice of viewing results on-screen or having it sent later by e-mail, as in the following picture:
If the user selects e-mail, processing is given lower priority and the relational gateway maintains the user's connection with the database system until the transaction is finished. When free processing time is available, the gateway then connects to the mail program and returns the results. The technical details for e-mail returns have been explained in [Bjo95/2]. Figure 5.1. Detail of query page.
(2) Time limit
Batch mode is automatically established to give control back to the user within a reasonable time period. The default limit is 45 seconds, since WWW-servers typically time out after 60 seconds and a few seconds are needed to notify the user of the switch.
When the time limit is reached, the user is presented with a notification that the time limit has been reached and that results will be forwarded by e-mail. The user is then returned to the home page.
Figure 5.2. The time limit has been reached
Processing of the transaction continues in the background and the user is not logged out from the database system until the transaction is completed.
(3) Data volume limit
If data volumes are large it naturally puts a strain on the server, and such results may also be cumbersome to view in a WWW-browser. We have set the default value to as low as 20 kB of data for testing purposes, but the value can be changed by the system administrator.
When the data volume limit is reached, the user is presented with a notification that the data volume limit has been reached and that results will be forwarded by e-mail. The user is then returned to the home page.
Figure 5.3. The data volume limit is reached.
Processing of the transaction continues in the background and the user is not logged out from the database system until the transaction is completed.
The gateway administers all batch files, formats them and then sends them by e-mail when processing time is available. In an early version of the gateway we waited until no users were logged in to the DBMS, but we found that this approach would sometimes lead to rather long e-mail delays without the system actually being put under heavy load.
6. Conclusions and further study
In this paper we have presented a relational gateway implemented as a standard CGI which for multiple concurrent users supports insertion and deletion of tables and columns, addition and deletion of data, simple and repeated queries, return of results by e-mail, and automatic handling of transitional state information and load balancing with return of results by e-mail.
We have discussed the advantages of letting the gateway dynamically generate HTML pages and shown how HTTP is used as a transport protocol for SQL sessions. For some transactions the single request-response pair of HTTP is sufficient, but for others the gateway has to act as a pseudo-client to the DBMS and establish an open connection for each user independent of the HTTP connections. In the latter case, the gateway must also use the DBMS to store transitional state information which cannot be captured within the HTTP protocol.
Finally we have discussed how load balancing can be implemented by establishing automatic cut-off points for time and data volume, where the user loses interactive control and results are instead processed in batch mode, and returned via e-mail.
We have used repeated queries as a means to capture some of the power of advanced SQL queries, but since we do not fully capture the expressional power of nested queries, more research is needed in this area. One problem here is to build a dynamically generated and graphically oriented interface which handles nesting of SELECT statements without becoming to cluttered. An easy way out would be to simply provide a text area where users can input SQL code directly - but this would fit badly with the point-and-click philosophy of the WWW.
Currently, only repeated queries make use of the state information preserved by the gateway, but conditional inserts, deletes and updates are also dependent on transitional state information and can be implemented over HTTP using this method. The implementation is straightforward, and will be carried out as soon as possible.
Our relational gateway only handles simple data types. A problem with current browsers is that users can receive various forms of data but the only user inputs allowed except for specifications of links are text-based. If HTML is going to be used as the sole interface for a multimedia database, there have to be some provisions for the input of pictures, sound etc., or retrieval of multimedia links specified as inputs through WWW browsers.
7. References
- [App90]
- Apple Computer Inc (1990). Data Access Language Programmer's Reference.
- [Ber91]
- Berners-Lee, T. (1991, updated -94). HTTP: A Protocol for Networked Information. http://info.cern.ch/hypertext/WWW/Protocols/HTTP/HTTP2.html, CERN.
- [Ber94]
- Berners-Lee, T. (1994). Hypertext Markup Language (HTML) ver. 1.2. http://info.cern.ch/hypertext/WWW/MarkUp/HTML.html, CERN.
- [Ber92]
- Berners-Lee, T. and R. Cailliau (1992). World-Wide Web. Computing in High Energy Physics 92, Anney, France.
- [Bjo95/1]
- Björn, M. and R. Hotaka (1995/1). A WWW Gateway for Interactive Relational Database Managament. http://www.scu.edu.au/ausweb95/papers/integrating/bjorn/index.html, AusWeb95, Ballina Beach, Australia, Norsearch Publishing.
- [Bjo95/2]
- Björn, M. and R. Hotaka (1995/2 (in printing)). "An HTTP Gateway for Interactive Relational Database Query with Deferred Reply." IEICE Transactions on Information and Systems, Special Issue on Advanced Database Technologies.
- [Decoux]
- Decoux Extension of Sectret's work. http://moulon.inra.fr/oracle/www_oracle_eng.html. Paris, France, INRA.
- [Dor94]
- Dorner, S. (1994). Eudora Manual. ftp://ftp.qualcomm.com/pub/mac/eudora/documentation/, Qualcomm Inc.
- [Eic94]
- Eichmann, D., T. McGregor, et al. (1994). Integrating Structured Databases Into the Web: The MORE System. http://www.elsevier.nl/cgi-bin/WWW94link/01/overview,The First International Conference on the World-Wide Web, Geneva, Switzerland, CERN.
- [Eve92]
- EveryWare Development Corp (1992). Butler User's Guide.
- [McC93]
- McCool, R. (1993, updated -94). The Common Gateway Interface. http://hoohoo.ncsa.uiuc.edu/cgi/, NCSA, University of Illinois at Urbana-Champaign.
- [NgJ93]
- Ng, J. (1993). GSQL: A Mosaic-SQL Gateway. http://www.ncsa.uiuc.edu/SDG/People/jason/pub/qsql/starthere.html, NCSA, University of Illinois at Urbana-Champaign.
- [Ole95]
- Oleary, R. and A. Pollard (1995). OraForm: Generically FORM'ing a multi-table query for the Oracle RDMBS. http://www.scu.edu.au/ausweb95/papers/tools/oleary/index.html, AusWeb95, Ballina Beach, Australia, Norsearch Publishing.
- [Oracle]
- Oracle Corp Oracle for Macintosh User's Guide.
- [Pro93]
- Prograph International Inc (1993). Prograph CPX ABC Reference. Halifax, Canada.
- [Ras94]
- Rasmussen, B. F. and B. Pirenne (1994). WDB - A WWW to Sybase Interface.http://arch-http.hq.eso.org/bfrasmus/wdb/poster/wdb_poster/wdb_poster.ps, ADASS'94.
- [Sho95]
- Shotton, C. (1995). http://www.starnine.com/webstar.html, Starnine Technologies Inc.
- [Var94]
- Varela, C. and C. Hayes (1994). Zelig: Schema-Based Generation of Soft WWW Database Applications. http://www1.cern.ch/PapersWWW94/cvarel.ps,The First International Conference on the WWW, Geneva, Switzerland, CERN.
!-- On-line HTML Copy Editor: Sandy Tse, stse@golum.riv.csu.edu.au -->
[Return to Table of Contents]
COPYRIGHT © 1995 by AUUG95 and APWWW95 Charles Sturt University. ALL RIGHTS RESERVED. ISBN 1 875781 43 9