Gloucestershire Local History banner

A Method for Putting a 'Free Format' Database On-line

Guidelines No. 10         Issue 1.3:         June 2007

The purpose of these guidelines is to show how it is now possible for local historians to put 'free format' databases on-line without suffering the large costs that this currently entails when using commercial organisations. Examples are provided of four systems under development. An outline is given of the basic requirements to build such a system and groups wishing to learn more are invited to contact the author for more detailed information.

 

1. Introduction
2. A User's View
3. Outline Technical Description
4. Building Your Own System
5. Concluding Remarks

1.    Introduction          [top]

Most web sites created by local historians consist of static pages (just like this one) where the user can navigate between pages but the content of each page remains fixed. This is fine in many cases where the site contains a relatively small amount of information. However once a lot of information is involved some means is needed to select and display the information that is relevant at the time.

One solution can be found on most E-commerce sites. For example, on a travel agent's web site a potential holiday-maker might type in the holiday locations and dates they are interested in. The software on the web server (the travel agent's computer) then finds which holidays and dates on offer match the user's requirements and the web server sends just these details to the holiday-maker's screen. This is a simple example of just one of many uses of on-line databases.

Until recently the necessary software and access to web servers to do this were expensive and consequently outside the reach of the local historian who wished to provide access to historical data such as census information or indexes to publications. However, free, legal and robust software is now available for this type of application and most importantly a few of the free Internet Service Providers (ISPs) now permit the necessary software and data to be loaded on their web servers.

The author has four such applications under development and it is suggested that the methods employed may be of interest to local history groups who might like to try something similar. It should be noted that the data in these particular applications are treated as 'free format' - that is each item (or record) in the database is not split into separate fields as in the tables in a relational database such as MS Access or Lotus Approach.

These guidelines set out the basic principles of the method and no attempt has been made to give full details as each case will have slightly different requirements and potential users are invited to contact the author to discuss their particular application.

2.    A User's View           [top]

User Interface

A key consideration of any computer system is to make it as 'user friendly' as possible. When a user opens one of the authors on-line database applications described below they see a simple web page which is similar to the Google search engine as it contains little more than just a Search Text Input Box.

Just as in Google the user types into the input box relevant key words or phrases for the items they wish to search for. The user then presses the Enter key (or clicks on the Submit Query button). After a few seconds all items that match the query are displayed in the user's browser (up to a preset limit of typically 200 items). This is different from Google which normally displays only ten items at a time. The total number of 'hits' or matches are displayed at the top of the output.

The opportunity to try out three of these applications is provided below.

Applications Under Development

Items displayed from a search will have been found in the potentially huge data file the developer has installed on the web server belonging to their Internet Service Provider. Each item returned is technically a 'paragraph' in the data file. Here are some examples taken from the data file for projects which are currently under development. Note they are simply examples of individual data items and not the result of any particular search.

A.     General Index to the Transactions of the Bristol and Gloucestershire Archaeological Society (1876-1991)
(Five sample items)     To try this application click here.

 

Astell, Vicar of Whittington, [4], 18 Asterby, Family, Arms of, [31], 239 Asterly, Lord of, [7], 174 Asthall (Oxon.), manorial records, [77], 165 Asthall Church, Ikenild Street at, [4], 212

 

B.     List of Contents of the Transactions of the Bristol and Gloucestershire Archaeological Society (1876-2002) (Three sample items)     To try this application click here.

Proceedings at Annual General Meeting, Bristol, 1918. 41 (1918-19) 1-10 The Work of the Society. Address. by J. E. Pritchard. 41 (1918-19) 11-26 Gloucestershire Fonts: (b) 15th Century. by A. C. Fryer. 41 (1918-19) 27-35

 

C.     Coaley Censuses 1841 - 1901 (Three sample items)     (Not yet available on-line)

1851, Coaley Glos., William WILKINS, 32, Male, Head, Widow, Living at: Coaley, Occupation: Butcher and Inn keeper, Born: Glos. Coaley, Schedule: 2-45a (9), PRO Ref.: HO107/1958 (281) 1851, Coaley Glos., Hannah WILKINS, 30, Female, Wife, Widow, Living at: Coaley, Occupation: Butcher and Inn keepers wife, Born: Glos. Coaley, Schedule: 2-45b (9), PRO Ref.: HO107/1958 (281) 1851, Coaley Glos., William A WILKINS, 7, Male, Son, Unmarried, Living at: Coaley, Occupation: Scholar, Born: Glos. Coaley, Schedule: 2-45c (9), PRO Ref.: HO107/1958 (281)

 

D.     A Historical Gazetteer of Cheltenham (Two sample items) To try this application click here.

The Bank, Prestbury. Short stretch of road linking High Street with Mill Street. Its naming appears recent; probably from the banked ground on north side of Prestbury High Street at this point, more obvious in older prints. Two houses opposite were formerly known as Bank Cottages. Barratt's Mill Lane, off High Street. Line present (unnamed) on 1806-10 map. 'Road which leads to Barrett's Mill' (PCM, 3 Dec. 1823). The main Cheltenham mill had been in the hands of the Barratt or Barrett family since 1763 or earlier (Goding, p. 261); they held a significant influence over municipal use of the water of the Chelt in the early 1800s.

There are about 134,000 items in the TBGAS General Index, over 2000 articles listed in the TBGAS contents list and about 5400 items in the Coaley Censuses. The Historical Gazetteer of Cheltenham in its printed form runs to about 200 pages.

More Detailed Instructions for Carrying Out Searches

Click here to visit the Search page for the General Index to the Transactions of the Bristol and Gloucestershire Archaeological Society (1876-1991) to try out the system for yourself. On this page you will see basic instructions on how to carry out a search. The instructions are also shown below.

Searches may be made using single keywords and phases (in double quotes e.g. "long barrow").  Use the usual three operators [and , or, not] between single keywords and phases to limit the range of the search.  If you do not put an "and" or "or" between items then an "and" is automatically assumed. You can also use brackets (.....), to group items e.g. (North or South) and Cerney.

Keywords will match all words containing the keyword, e.g. castle will match   castle, castle-mounds, Newcastle etc.  Where this is a problem you can achieve an exact match (but for one word only) by adding a # to the end of the word, e.g.   castle#   which will only match castle.  Click on volume numbers in the results to view the contents page for that volume. If page numbers in the results have a link you can click on the page numbers to go directly to the title of the article the item refers to.  The title of the article will appear at the very top of your browser window.

Local Use of a Database System

Although envisaged to normally operate on-line the software and data files can of course be used locally on a computer that is set up to act as a web server. In this case all activities take place purely on the local computer without the need for any connection to the Internet by telephone or any other means. The disadvantage of this is that all updates to the data and software must be carried out at each location that is running the application. However, this is a small price to pay for the database always appearing to be on-line without the cost or inconvenience of the telephone tied up continuously. The relevant software is free and can be obtained for Windows from the IndigoPerl web site.

3.    Outline Technical Description           [top]

The system relies on four main things over and above the usual things such as access to the Internet and tools for editing files.

The four key requirements are:

  1. A free dial-up Internet account with one of the few ISPs that allow the use of customer supplied CGI scripts on their web server.
  2. A web-page on that same ISP's site which displays the Search Text Input Box and possibly help information.
  3. A data file in plain ASCII text containing all the data to be searched for each query.
  4. A suitable CGI (Common Gateway Interface) script written in the computer language Perl.

After the data file the key to the whole process is the CGI script which is effectively a computer program. The script accepts the search key words and phrases from the box on the web page in item 2 above. It then searches through the data file for items that match the search criteria. Finally it sends a count of the items found and the items themselves (a series of paragraphs) to be displayed on the user's web browser such as Internet Explorer. A general purpose script has been written by the author such that normally only minor changes are required to adapt it for any new application.

4.    Building Your Own System           [top]

The last section gave the four key requirements and this section considers how a group might set about meeting these.

  1. A free dial-up Internet account with one of the few ISPs that allow the use of customer supplied CGI scripts on their web server may be set up almost instantly. Please contact the author for advice on which ISPs you could consider using at that time. If you prefer you could get the facility to run CGI scripts by paying for one of the low cost Internet hosting packages that most ISPs provide aimed at small business users.
  2. The web-page which displays the Search Text Input Box and possibly help information can be constructed by anyone who has a reasonable experience of creating basic web pages. They will need to know about 'forms' on web pages as this is how the Search Input Text Box is handled.
  3. The data file in plain ASCII text containing all the data to be searched for each query and displayed according to the search criteria is clearly the key item and the sole responsibility of the group setting up the system. The examples in Section 2 show the sort of data that can be used with this approach. The basic rule is that each item in the data file is separated by a newline character. Simply put, each item appears to be on a single line in the file although the line might be several thousand characters wide! The output of long items by be conveniently broken into paragraphs when displayed if the data file contains two HTML break codes (<br><br>) at the appropriate places in the text. It is important not to confuse these two different uses of paragraph. There is more on data files at the end of this section.
  4. A suitable CGI script written in Perl could be written by a member who has the necessary experience and time (the basic script is more than 500 lines of Perl code). The script developed for the current applications should be readily adaptable for most new applications. The copyright of the script would remain with the present author and a very small charge may be levied in respect of the necessary modifications as a contribution to the costs of developing the methodology.

More About the the Data file and Searching it.

The potential long length of the items in the data file should not normally be a problem as it is envisaged that other computer programs such as relational databases like MS Access and Lotus Approach may be used to generate the data file. Such an application may be described as a free format database even though the 'free format' data may actually be derived from several fields from an Access database as in fact is the case for examples B, C and D in Section 2 above.

In a museum catalogue the inclusion of an extra keyword in a search like book, photograph or painting will (more or less) guarantee that the items returned will be that type or object. Similarly, for example the keyword 1997. (note the dot) will return all the items added to the collection in 1997 where the accessions for that year have identifiers 1997.1, 1997.2 etc.

The method under discussion is not generally suitable for the 'rule based' queries which are possible with relational databases such as Access where, for example, you could count (or at least estimate!) the number of inhabitants over 50 from census data.

It is possible to manipulate and display on-line data that is stored in tables in relational databases like Access. The MySQL package is an extremely powerful and robust piece of free software which does this. It is based on the industry standard SQL language (pronounced Sequel) that has been used on mainframes for many years. Unfortunately, there appears to be only a few ISPs, at present, who offer MySQL facilities free of change although a number provide it as a 'paid for' service similar to the provision of CGI facilities discussed above.

5.    Concluding Remarks           [top]

The improved availability of facilities to run CGI scripts has made it possible for local historians to build powerful on-line free format databases. The databases are fully searchable and can be applied to a variety of data such as the Censuses, gazetteers, indexes to publications and simple museum catalogues. Once a group has built a system for one set of data then it is normally a simple matter to build systems for other data that the group might wish to treat in the same way. Where it is more convenient it is also possible to run the same database on a locally based computer rather than on the Internet.

Feedback on these notes will be welcomed. Please send them to the Author at ray.wilson@coaley.net          [top]