Reconstitution of Meaning: MARC Fields as Morphemes (Gerry McKiernan) ERCELAA@ctrvax.Vanderbilt.Edu 30 Jun 1997 13:09 UTC
Date: Sun, 29 Jun 1997 15:03:56 -0500 (CDT) From: Gerry McKiernan <JL.GJM@ISUMVS.IASTATE.EDU> Subject: Reconstitution of Meaning: MARC Fields as Morphemes _Reconstitution of Meaning: MARC Fields as Morphemes_ In considering the possibilities of making better use of the intellectual content embedded in MARC fields in my review of the potential application of Data Mining and Knowledge Discovery in Databases (KDD), it has occurred to me that such an an investigation might prove useful if MARC fields were viewed as _morphemes_ [no not morphine [:->]. The morpheme is considered by (many) linguists as a basic unit of meaning within a language. In the cataloging process, meaning is embedded within a defined structure using acceptable rules of grammar (e.g. AACR2) - syntax if you will [:->]. In such a process, a message about an individual work is conveyed, using this grammar and an associated lexicon. Here the physical 'meaning' and intellectual 'meaning' of an item are translated into a message that is intended to describe the item and its content. While this process of bibliographic control has enabled users to identify 'meaningful' items relevant to an information need, most existing and (even) New Age OPACS I've identified and compiled in my Onion Patch (sm) clearinghouse at URL http://www.public.iastate.edu/~CYBERSTACKS/Onion.htm do not, I believe, make full use of the meaning explicit or implicit within these records. To identify items that are most relevant to users [BTW: 'Relevance' is a 'meaning-full' concept [:->]], we need to contemplate the creation of OPACs that provide users (or allow users) to 'reconstitute' the meaning within these records. We need to develop systems that can present users with items (i.e., records of cataloged items within the OPAC) that best meet their needs using an 'optimal syntax' determined by the 'meaningful' associations uncovered by a Data Mining or a KDD process, or provide users with the ability to select a different syntax (e.g. subject and publisher associations), to identify that 'good book' on the subject. Likewise, we need to provide users with the ability to 'cross-tabulate' associations within MARC fields such that they be provided with a ranked listing of items by publisher-author-call number, or call number-publisher, or subject heading-publisher, or other potentially meaningfull association of their choosing. [I have sketched out a mock-up interface for this function and will certainly let the list(s) know, when it's available] In addition to associations revealed in the application of Data Mining and KDD to an appropriate catalog database (e.g., the OCLC cataloging database) or selected local OPAC database of peer groups (e.g. RLG), as well as the desired associations of users themselves, comprehensive log data should also reveal useful associations that might provide a new syntax, or enhance one already considered. [Here circulation data would be very important, as would OPAC transaction log data, as Larson has demonstrated in his study of subject access in OPACs] One could envision the application of the methods of Computation Linguistics applied to MARC records or even (perhaps) Transformational/Generative Grammar [:->] [Long Live Noam Chomsky!] ! Once again, as always, any reactions to such musings would be most welcome. [In particular, I am interested any literature relating to the application of linguistic theories/practices to bibliographic and MARC record structure.] Regards, Gerry McKiernan Curator, CyberStacks(sm) Iowa State University Ames IA 50011 gerrymck@iastate.edu http://www.public.iastate.edu/~CYBERSTACKS/ "Oh No!, Not Another Project" P.S. One could certainly apply these envisioned methodologies to any Metadata regime (e.g. The Dublin Core, TEI, etc.).