W3C

Giornata XML e database

Pisa,25 ottobre 2002 - Area della Ricerca CNR
Programma della giornata

The XCDE Library: XML Compressed Document Engine

Paolo Ferragina
(Dipartimento di Informatica, Pisa)

Abstract

In this talk we will survey the implementation of a C library offering a set of algorithms and data structures for indexing and searching a compressed XML document collection.

The documents must be well-formed and may reflect different DTDs. The library supports the storage and management of these XML files in native and compressed form, operating directly at the File System level. The main features of the library are: state-of-the-art algorithms and data structures for text indexing, compressed space occupancy, and novel succinct data structures for the management of the hierarchical structure of the XML document.

As a result substring, regular expression, approximate and proximity searches on the textual content of the XML document as well on the attribute values can be executed in an efficient way. Resolving structural queries on (partially specified) tag paths can be also done efficiently by using a novel implementation of the hierarchical structure of the XML document.

Overall, the compressed XML document plus all of its indices occupy no more than the original file size. It goes without saying that the XCDE library is intended just as a kernel of a more complex XML-query engine or an XML-document engine. It may be used to implement most of the basic functionalities of XQuery, and it may support IR-like searches. Currently we are using the XCDE library to design an XML search engine for a collection of italian literary texts marked with TEI.

For details see http://sbrinz.di.unipi.it/~xcde/xcdelib.html.

(Joint work with Andrea Mastroianni.)

La presentazione


Webmaster · Last modified: $Date: 20-giu-2003 07.30 PM $

Valid XHTML 1.0! Valid CSS1!