Home

Archweb

Home

Archaeology and XML Newsletter 6

ARCHAEOLOGY AND XML NEWSLETTER
NUMBER 6

December 2005
CONTENTS
Introduction
News
Adding Value To Grey Literature - Gail Falkingham
MIDAS And The Geography Markup Language - Ian Painter

INTRODUCTION
Mark Bell

Welcome to newsletter number six on archaeology and XML. I am pleasantly
surprised that there is still an interest in this newsletter. I am sure that the interest in
XML in the heritage sector will continue to grow in the future. The MIDAS initiative
is definitely raising the profile of XML and in this issue Ian Painter contributes a
piece on MIDAS. Our other main item is by Gail Falkingham on Grey Literature and
XML markup- Curators please take note! Definitely an area that needs major development
by the archaeology profession.

The plan is to get the next newsletter out in mid 2006, as ever depending on
contributions and events.

NEWS

CIDOC annual meeting 2006 Gothenburg, Sweden

The CIDOC annual meeting and conference 2006 will take place in
Gothenburg, Sweden, September 10-14 2006. Further info at:
http://cidoc06.se

CAA 2006, Fargo North Dakota, USA
James Landrum emails a reminder that the Computer Applications in Archaeology
Conference (CAA) 2006 in Fargo North Dakota will have XML related presentations
and that the organizing committee welcomes submissions of other abstracts for
presentations on XML and all other projects of relevance to digital heritage. The
CAA2006 web site is at
http://www.caa2006.org

ADDING VALUE TO GREY LITERATURE:
Presentation, Preservation and Reuse Of Unpublished Archaeological Reports Using
The XML Version Of The TEI Guidelines

Gail Falkingham Senior Archaeologist, North Yorkshire County Council

Readers of Archaeology and XML Newsletter number 2 will be familiar with the
work of Christiane Meckseper in applying XML encoding to archaeological
excavation reports using the DTD of the Text Encoding Initiative (TEI
www.tei-c.org.uk; Meckseper and Warwick 2003; Meckseper 2004). My own
research, for an MSc in Archaeological Information Systems at the University
of York completed in 2004, has taken this concept a stage further. I have
explored the potential that XML encoding offers for the multi-layered
presentation of archaeological grey literature on the Web, and how this
might assist with the repurposing of report content for other uses, such as
the population of heritage database records (Falkingham 2005).

In the present climate of developer-funded archaeology in England, there are
literally thousands of archaeological projects being carried out each year
resulting in a similarly high number of unpublished grey literature reports.
There are concerns about the restricted accessibility of this information as
these reports generally have a limited distribution. Moreover, whilst these
reports are 'born digital' as word-processed documents, it is a printed hard
copy that has traditionally been circulated and then deposited with the
project archive. The electronic documents are often seen as a by-product,
with little thought given either to their future preservation or their
potential for reuse.

Within the archaeological profession, the importance of these digital
reports and their preservation must be recognised. Likewise, the potential
offered by electronic means of delivery and dissemination via the World Wide
Web should be taken advantage of. Since the completion of my research, there
have been further developments in this area, including increasing use of the
OASIS III project Online Data Collection Form, first launched in April 2004,
which enables the uploading of a digital copy of a report to accompany an
event record (http://ads.ahds.ac.uk/project/oasis/). In addition, the
Archaeology Data Service (ADS) online Library of Unpublished Fieldwork
Reports is making available a growing number of reports
(http://ads.ahds.ac.uk/catalogue/library/greylit/index.cfm). The digital
file formats most commonly adopted for this purpose are Microsoft Word and
Adobe PDF files. The pros and cons of these, and other, file formats were
considered as part of my research alongside those of XML. I concluded that
XML encoding of reports offers a number of advantages over these more
commonly used proprietary file formats, particularly with regard to the
preservation, manipulation and presentation of electronic text.

As Meckseper's research showed, there was no pre-existing archaeological DTD
or schema suitable for the task of encoding reports (Meckseper 2004).
However, the XML version of the TEI Guidelines (TEI P4), widely used in the
humanities for literary and linguistic text encoding, and one of the most
accessible standards for doing so, can be applied for this purpose
(Sperberg-McQueen and Burnard eds 2002). It was this, therefore, that I
chose to use in the practical case study for my research.

My first step was to undertake document analysis of a sample of three
archaeological grey literature reports from the North Yorkshire Historic
Environment Record (HER), following the guidance of Morrison et al (2002).
Structure outlines were created to look at the similarities and differences
between reports and to identify the large structural units. I also
considered the intended readership and users of these reports and what they
might want from electronic access to them. The CBA Publication User Needs
Survey (PUNS) report showed that archaeological reports are rarely read in
their entirety from start to finish, certain sections are more popular than
others (Jones et al 2001). This influenced my decisions about the level of
detail to apply to the encoding of the reports' structure and content, as
did a consideration of the potential for export of selected data for use in
the population of other heritage datasets.

Three categories of user were identified: the general public, the curatorial
archaeologist and the specialist researcher. For a general user, the
retrieval of the summary and conclusion sections of the reports was
considered the most relevant, along with the ability to view the images
within the reports. For curatorial archaeologists, the retrieval of data
suitable for the population of an OASIS Project record
(http://ads.ahds.ac.uk/project/oasis/) and/or an HBSMR HER event record
(http://www.esdm.co.uk/HBSMR.asp) was deemed appropriate. The ability to
retrieve selected specialist appendices of a report was also considered
desirable for a specialist researcher.

The DTD used in my case study was downloaded using the TEI Pizza Chef
(http://www.tei-c.org/pizza.html). A TEI Header containing relevant metadata
was compiled for each of the three reports and the main text divided into
front, body and back matter. More detailed encoding was added to a selection
of the reports' structure and content, guided by the aims for the retrieval
of data referred to above. For the purposes of interoperability, nationally
accepted archaeological vocabularies were used in the mark-up of the report
content wherever possible. Accordingly, relevant keyword schemes from the
Forum on Information Standards in Heritage (FISH) 'Inscription' list of
wordlists were referenced in my encoding (http://www.fish-
forum.info/i_lists.htm).

Once I had created the XML documents, I looked at how these could be
displayed in different ways for the needs of the different user groups I had
identified, and how selected data could be extracted and presented by means
of simple styling and transformation performed firstly by the application of
CSS, and then by XSL style sheets with XPath queries. The latter of these
two options was my preferred method, as it also offers the potential to
output data in a variety of formats, such as HTML, XML and plain text,
depending upon user requirements.

Having demonstrated the potential of these methods, I considered the
practical application of such an approach in the 'real world'. I concluded
that the most effective stage for the encoding to be applied would be at
source; by the original authors at the time the report is being written.
This would enable the benefits of having XML-encoded documents to be gained
by both the author and all subsequent users. Clearly, however, there would
need to be support for such an initiative throughout the archaeological
community, including access to training and guidance as well as adequate
resources.

Should this approach be adopted, there are options as to how an online
digital XML report archive might be achieved, either as a centralised Web
resource or as a federation of distributed resources. A centralised archive
of encoded reports could be made available in a variety of formats, with
search capabilities offering the potential to download selected content
dependent upon the level of mark-up applied. Alternatively, producers of
archaeological grey literature may prefer to make encoded reports available
through their own websites, online HERs, or other national project websites.
Each of these resources could hyperlink to one another, or, if
interoperable, could be harvested into a global virtual archive (Richards
2000).

There is a call for the development of new methods of electronic publication
for archaeological reports, in order to promote access to, and reuse of,
data. The practical work I have undertaken, shows how XML technology offers
several advantages for data dissemination and presentation over the
proprietary file formats favoured at present. From a single XML document,
content may be transformed in different ways for a variety of users in a
variety of formats. Such an approach may also assist in repurposing this
data for other uses.

This has the potential to minimise the widespread duplication of effort that
can currently be seen amongst a range of users who are re-keying similar
information from reports into a variety of different heritage datasets at
both local and national levels.

However, the resources and skills for an XML-based approach to encoding
archaeological grey literature are not yet widely available within the
profession. As the archaeological community becomes more familiar with the
use and potential of XML and the importance of preserving digital versions
of fieldwork reports, it is hoped that the approach I have outlined in this
article could contribute towards the development of an innovative, flexible
and sustainable means of facilitating online access to, and dissemination
of, archaeological reports and their content.

References

Falkingham 2005 'A Whiter Shade of Grey: A new approach to archaeological grey literature
using the XML version of the TEI Guidelines'
Internet Archaeology 17 http://intarch.ac.uk/journal/issue17/falkingham_index.html

Jones, S., MacSween, A., Jeffrey, S., Morris, R. and Heyworth, M. 2001 From the Ground
Up. The Publication of Archaeological Projects, a user needs survey (PUNS), Council for
British Archaeology. http://www.britarch.ac.uk/pubs/puns/index.html.

Meckseper , C. 2004 'The Mark-up of Archaeological Excavation Reports using the DTD of
the Text Encoding Initiative (TEI)', Archaeology and XML Newsletter 2
http://archweb.co.uk/archaeology_and_xml_newsletter_2

Meckseper, C. and Warwick, C. 2003 'The Publication of Archaeological Excavation Reports
using XML', Literary and Linguistic Computing 18 (1), 63-75.

Morrison, A., Popham, M. and Wikander, K. 2002 Creating and Documenting Electronic
Text: A Guide to Good Practice. AHDS Guides to Good Practice. Oxford: Arts and
Humanities Data Service http://ota.ahds.ac.uk/documents/creating/.

Richards, J.D. 2000 'Integrated Access to Historic Environment Information Resources',
April 2000 SAA Session 'Digital Data: preservation and re-use'.
http://www.csanet.org/saa/saa-ads.html.

Sperberg-McQueen, C.M. and Burnard, L. (eds) 2002 TEI P4: Guidelines for electronic text
encoding and interchange. Text Encoding Initiative Consortium. XML Version: Oxford,
Providence, Charlottesville, Bergen, http://www.tei-c.org/P4X/.

MIDAS and the Geography Markup Language
Ian Painter

The big IT players agree on very little these days, however, one thing they do all
agree on is exchanging data using XML. XML has undoubtedly broken through as a
ubiquitous open standard. XML stands for eXtensible Markup Language and is
developed by the World Wide Web Consortium or W3C for short. Born in Nov 1996
XML quickly established itself as the platform independent means of data transfer,
opening up a wide range of opportunities particularly in the business to business
(B2B) arena.

Covering all the benefits and opportunities of XML is far too bigger subject for this
short piece so I'd like to just focus on the eXtensible nature of XML. One of the
reasons why XML has been so successful is the fact that it can be extended to fit a
particular purpose or application. This extensibility is achieved through another part
of the XML standard, namely XML Schema. I like to think of XML Schema as a
machine readable User Guide. Let me explain, before XML, data was transferred in
many different propriety file formats and in order to read the incoming file, the
receiver would need to study a detailed user guide in order to find out how the file
was structured, for example what values were being passed, whether they were
strings, integers etc. This was very much a human process and very easy to make
mistakes, after all, these types of documents are not exactly an exciting read! XML
Schema provides an electronic user guide that is machine-readable and removes the
need for human interaction in data transfer. Here's a simple example of an XML file
and it's associated XML Schema:

XML:

<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

XML Schema:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com"
elementFormDefault="qualified">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

Now that we all know XML Schema, I'll move on and start talking heritage. The
schema above is known as an application schema - in this case it's the 'note'
application schema as it defines the content of a note. In the heritage field we have
the MIDAS (the Monument Inventory Data Standard) application schema. MIDAS
sets out an agreed list of the items or 'units' of information that should be included in
an inventory or other systematic record of the historic environment. These cover areas
such as Monument Character, Events, People and Organisation etc. It is a 'content'
standard or 'metadata' standard for historic environment information.

As a company specialising the geographic XML tools one of the things that initially
drew our attention to MIDAS was the fact that it had geographic content in its
application schema. This interest came about after we received numerous inquiries
from our customer base asking if our GO Loader product would support MIDAS. In
theory we should be able to, after all MIDAS had geographic content and we have an
XML schema aware loading tool (GO Loader). Unfortunately this wasn't the case.
The reason- well it all comes down to the eXtensibility of XML. As I discussed
earlier, XML defines native types such as strings, integers, dates etc, but it needs to be
extended in order to support geographic type such as point, lines and polygons.
Here's where GML comes in.

GML came about after a group of companies and government bodies created a
standards organisation called Open Geospatial Consortium (OGC). Amongst other
things OGC set out to create an abstract XML Schema that extended XML to define
geographic types. This schema became known as the Geography Markup Language
(GML). The idea behind GML was to provide an XML Schema that application
schemas could extend in order support geographic information. This would also give
vendors a common understanding of how geographic elements were coded in XML
which would further encourage the development of generic GML tools. Since its
inception in 2000 GML has rapidly established itself and is now in its 3rd major
release. Undoubtedly its biggest success has been its migration into a ratified ISO
standard namely ISO 19136 which is part of the TC211 programme.

Getting back to MIDAS, whilst it's great to see heritage data being exchanged as
XML, unfortunately MIDAS doesn't extend GML and so defines its own geographic
types in midas_spatial.xsd. This makes MIDAS difficult to support for us GML
vendors. If MIDAS did extend GML then not only would it be more standards
compliant, but a whole host of off the shelf tools would be able to work with the data
without requiring software change. This would undoubtedly encourage the adoption
MIDAS as most government bodies (local and central) already have GML tools in
order to support Ordnance Survey OS MasterMap data. It would also open up a host
of standard based web services that are based on GML, for example Web Mapping
Services and Web Feature Services.

In conclusion, designing an application schema such as MIDAS can be a difficult task
and it's easy for an outsider like me to say that MIDAS should extend GML. I know
GML, as a company we work with it day in day out. Everybody agrees reuse is better
than reinvention but the difficulty is in knowing what's there to reuse and how to
reuse it.

To find our more about my company and what we do visit:
www.snowflakesoftware.co.uk

You can find my original thread and comments concerning MIDAS on the FISH
forum here:

http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0511&L=fish&T=0&F=&S=&P=2187

To find out more about XML and XML Schema visit the excellent W3CSchools
website:

For XML visit:
http://www.w3schools.com/xml/default.asp

For XML Schema visit:
http://www.w3schools.com/schema/default.asp

For a simple if some what dated introduction to GML visit:
http://www.w3.org/Mobile/posdep/GMLIntroduction.html

For more information on the Open Geospatial Consortium visit:
http://www.opengeospatial.org/

END OF NEWSLETTER NUMBER 6

This newsletter is copyright © Mark Bell and the individual authors, 2005.
Please contact the editor before reproducing material from this
newsletter.

  • Home
  • Services
  • Projects
  • The Dark Ages
  • On the web
  • XML newsletter
  • Newsletter 1
  • Newsletter 2
  • Newsletter 3
  • Newsletter 4
  • Newsletter 5
  • Newsletter 6
  • Newsletter 7

User login

  • Create new account
  • Request new password

Navigation

  • Recent posts
archweb
Theme port sponsored by Duplika Web Hosting.