Home

Archweb

Home

Archaeology and XML Newsletter 5

March 2005

CONTENTS OF THIS NEWSLETTER

INTRODUCTION
NEWS AND LINKS
THE FISH TOOLKIT. WHAT'S IN THE BOX?
IMPLEMENTATION OF XML AND WEB SERVICES IN ARCHAEOLOGY


INTRODUCTION Mark Bell

Welcome to newsletter number 5. As I've stated before this is an irregular newsletter, depending on what material is available. The original idea was to do four newsletters per year, but that has proved impractical. I will now try to do two a year, with the next one out before the end of 2005 all being well.

As ever contributions are welcome, not just articles but news of conferences, web resources, XML teaching resources and practical examples of use of XML - not just in archaeology but in related areas such as history and classics.

An important event this year will be the launch of the FISH toolkit at the IFA conference in Winchester on the 24th March. This really shows the practical importance of XML in 'mainstream' archaeology. XML terminology is going to be on the lips of many more archaeologists in future. I'm please to have an article by Ed Lee in this issue which explains more about the FISH toolkit.

Mark Bell

NEWS AND LINKS

Conferences The IFA conference at Winchester from 22nd to 24th of March 2005 will see the launch of the FISH toolkit - see below. The session abstract is not on the IFA website (!) but can be found on the FISH mailing list at: http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0502&L=fish&T=0&F=&S=&P=3458

The International Conference on Dublin Core and Metadata Applications is to be held at the University Carlos III of Madrid from 12 to 15 September 2005. See http://dc2005.uc3m.es/

CSA newsletter The Winter, 2005 Newsletter from the CSA (The Center for the Study of Architecture/Archaeology http://csanet.org) has an article "Past, present and future: XML, archaeology and digital preservation - Searching for a common language" by William Kilbride at http://csanet.org/newsletter/winter05/nlw0502.html

Open archaeology The open archaeology website at http://www.openarchaeology.org/ is "an XML-based repository of archaeological and related information" though there is not much information about who is behind the website. Further information would be welcome.

Excavating the Hard Drive The full text of "Excavating the Hard Drive: Archaeological Research, XML, and 3D Graphics" a poster presented at the 16 th International Congress of Classical Archaeology, Boston in August 2003 is available at http://www.perseus.tufts.edu/Articles/20030823_Milbank_AIAC.html in pdf format.

Classics @ Classics @ is "The Electronic Journal of the Center for Hellenic Studies of Harvard University" and volume 2 for 2004 has no less than three papers on XML:

Rebecca Frost Davis, "Collaborative Classics: Technology and the Small Liberal Arts College" Susan Guettel Cole, "From GML to XML," Sandra Boero-Imwinkelried, "Vicus Unquentarius: Perfume, Epigraphy, and XML" They can be found at: http://www.chs.harvard.edu/classicsat/issue_2/index.html

THE FISH TOOLKIT. WHAT'S IN THE BOX? Edmund Lee English Heritage FISH Interoperability Toolkit Project Manager edmund.lee@english-heritage.org.uk

The FISH Interoperability Toolkit funded by English Heritage and the National Trust has been developed by the UK Forum on Information Standards in Heritage (www.fish-forum.info) to assist with the process of moving information between the many different information systems used to record the historic environment.

The Toolkit metaphor has been used to convey the approach that additional 'tools' will be added as resources allow. For its initial launch (24 th March 2005 at the conference of the Institute of Field Archaeologists in Winchester, UK) the following tools are in the box.

A. Technical tools:

1) MIDASXML. The heart of the Toolkit is a set of 3WC XML schemas which provide a common format for the storage, and exchange of historic environment information. It covers all the information currently included in the MIDAS standard issued by FISH.

2) Data Validator Tool. This online application validates the content of MIDASXML files. The presence or absence of data required by standards such as the English HER Level 1 Benchmark is checked, and compliance to terminology standards issued by FISH in the INSCRIPTION standard is verified. Reports are issued on the compliance of each entry to the relevant standards.

3) Data Validator maintenance tool. This online database stores the terminology standards and processing rules that support the operation of the Data Validator. Access is limited to Toolkit Consortium members.

4) Historic Environment Exchange Protocol (HEEP). This protocol provides a specification for IT developers to use in developing internet enabled versions of datasets ('data servers' in the jargon). It is a specification for a 'Web Service' which is a widely used IT industry standard for creating online access to data. HEEP standardizes the manner in which historic environment information resources can be queried remotely, and the format in which the requested data is delivered to 'clients' (i.e. those machines requesting the data). It will also standardize how HEEP-enabled servers report their capabilities and permissions required for access, and the format in which exceptions (problems) are reported.

B. Documentation and presentation materials:

Supporting reference information, presentations, information sheets and training materials will be developed as experience with the Toolkit progresses. All these will be made freely available online via the project web-site.

A dedicated website is in preparation. For now, links to the Toolkit are via the main FISH web-site at www.fish-forum.info

IMPLEMENTATION OF XML AND WEB SERVICES IN ARCHAEOLOGY Russell Gant Systems Developer Wessex Archaeology Ltd

Any archaeological organisation will have a large number of databases, word documents and spreadsheets all holding information. Due to the fragmented way in which the data is held, it is not possible to run queries across multiple databases to return, for instance, information on the presence of a particular pottery type at different sites. Accessibility to the data can sometimes be hampered by very old database formats and data structures often do not follow the best standards and rules of normalisation.

Likewise, data exchanged between organisations is sometimes difficult to handle. What is referred to as data might be better described as electronic versions of paper reports. It can be near impossible to extract information from pdf files, word documents and countless other document types for insertion into databases, except by manual copy and pasting. For short reports this is manageable, but for thousands of pages of 'data' it can pose a serious impact on projects in terms of time and manpower wasted.

At Wessex we consume vast amounts of data provided by external bodies and often face the difficulties outlined above. I have developed visual basic routines where necessary to handle some formats of data, parsing them for insertion into databases via ADO. Sometimes there are clear delimiters and sometimes not. It is never a straightforward process.

A solution to the data problem may lie in the use of XML. Combined with fast, robust centralised relational databases, web services and web applications, you can enable information to flow effectively both internally and externally - so long as the XML format is a recognized standard.

MIDAS XML is a new standard for data transfer between heritage bodies funded by English Heritage and is based on the solid foundations of the CIDOC CRM. Due to be officially unveiled at this year's IFA conference, it promises a standard format of data exchange. I hope that there will be a general uptake of this new standard as it will help to deliver an end to the hundreds of wasted man-hours spent deciphering poorly formed 'data'.

XML can also be transformed with XSL into a multitude of formats. You can generate delimited text files for download, XHTML with accompanying CSS for display over the Internet, or generate SVG charts and graphs for more visual impact - all using the same XML document as input. This output can be streamed to the end user by web services using ubiquitous protocols.

At Wessex we are currently developing several in-house systems utilising SQL Server databases, Web Services, XML and XSLT:

Molluscs Database An online repository for snail data from environmental sampling. The data held in a central database is transmitted via a web service to an ASP.NET front end. As well as query the database, the web service also carries out statistical calculations. The resultant datasets are returned as XML and formatted into HTML for display. It is intended to also generate SVG charts from the same XML data feed.

Survey Vault An experimental look at generating SVG mapping from site survey data. SVG files are generated on the fly in response to queries on associated metadata against a centralised database holding survey data.

Warrior A system for querying summaries of past excavations and assessments. The format of output is determined on screen by the user and actuated via XSLT files. Possible outputs are long and short HTML reports and tab delimited or MIDAS XML formatted for download.

I hope that it is apparent that at Wessex we are embracing new technologies and concepts that are of benefit not only to ourselves but also the wider community. The uptake of standard XML formats and the use of web services to deliver this information will help open up hidden repositories of knowledge. It is dependent on resources being available to organisations to implement the changes and it will take time, but it is a future worth the wait.

END OF NEWSLETTER NUMBER 5

This newsletter is copyright © Mark Bell and the individual authors, 2005.
Please contact the editor before reproducing material from this newsletter.

  • Home
  • Services
  • Projects
  • The Dark Ages
  • On the web
  • XML newsletter
  • Newsletter 1
  • Newsletter 2
  • Newsletter 3
  • Newsletter 4
  • Newsletter 5
  • Newsletter 6
  • Newsletter 7

User login

  • Create new account
  • Request new password

Navigation

  • Recent posts
archweb
Theme port sponsored by Duplika Web Hosting.