home
BINFS : StarDOM : TechInfo

The STAR format

The Self-Defining Text Archival and Retrieval (STAR) format has become a standard in structural biology. Several scientific databases (e.g. PDB, CCDC, ICDD, BioMagResBank) use the STAR format to store structural, crystallographic diffraction and NMR data. A growing number of programs (e.g. CNS, NMRView, MODELFREE) can utilize the STAR format for their respective data output.

References

STAR specification publications:

  • The STAR File: A new Format for Electronic Data Transfer and Archiving
    S. R. Hall
    J. Chem. Inf. Comput. Sci. 31, 326-333 (1990)
  • The STAR File: Detailed Specifications
    S. R. Hall and N. Spadaccini
    J. Chem. Inf. Comput. Sci. 34, 505-508 (1994)
  • STAR Dictionary Definition Language: Initial Specification
    S. R. Hall and A. P. F. Cook
    J. Chem. Inf. Comput. Sci. 35, 819-825 (1995)
mmCIF specification publications:
  • The Crystallographic Information File (CIF): a New Standard Archive File for Crystallography
    S. R. Hall, F. H. Allen and I. D. Brown
    Acta Cryst. A47, 655-685 (1991)
  • Macromolecular Crystallographic Information File
    P. E. Bournce, H. M. Berman, B. McMahon, K. D. Watenpaugh, J. D. Westbrook and P. M. D. Fitzgerald
    Methods in Enzymology 277, 571-590 (1997)


XML

The eXtensible Markup Language (XML) is a standard for semantic markup of data independent of a particular application domain. This independence implies that many parties develop different parsers; software is thoroughly tested across specific problem domains. Besides, parser implementations exist in a wealth of programming languages (including scripting languages), which means more freedom for the scientist wishing to analyze certain data.

Here's a list of the advantages of XML compared to STAR:

  • Standard XML parsers: As XML is used in a broad spectrum of application domains, many different parser implementations are available. This enables the programmer to choose a well-tested parser for a given problem.
  • Standard XML viewers/editors: With the advent of the next generation of Web browsers, XML will be supported as a standard format for data exchange over the web. Hierarchical information contained in a web file can be displayed and edited in general-purpose XML viewers (such as Microsoft's Internet Explorer 5) or editors (such as IBM's Xeena)
  • XML query languages: Currently, various query languages to extract information from XML sources have been proposed. These query languages enable the users to formulate ad-hoc queries in a structured way.
    Two proposed standards are XQL and XML-QL. For both proposals, working prototype implementations are available.
  • Validity of documents: XML documents can be validated against a Document Type Definition (DTD). In this way, the integrity of the data can be checked as the document is generated.

Further information

Here's a collection of links relevant to XML:

  • W3C XML site is where you'll find all the standards and official activities.
  • XML.com is a commercial portal containing good articles and news about XML.
  • Cafe con Leche is a weblog/portal sort of thing for your inofficial (read: where the work is done) XML activities update.
  • The Python XML topic guide is the starting point for XML processing in Python.


Jens Linge, Lutz Ehrlich