Bradford Distribution... (2 messages) Marcia Tuttle 06 Jun 2003 13:44 UTC
----------1 Date: Thu, 5 Jun 2003 14:53:30 -0700 From: jmcdonald@library.caltech.edu Subject: Re: Bradford distribution, 80/20, and larger samples Thanks to Steve for bringing up these intriguing observations. When considering the Bradford distribution and the 80/20 rule, it is important to remember that the shape of the distribution curve is dependent on the internal structure of the data (Oluic-Vukovic 1997) and as the number of accesses increases towards infinity the concentration of titles increases as well (Egghe 1987). An aggregated database that features 500,000 accesses and 8000 titles has an extremely high concentration per number of accesses, resulting in a lower proportion of items covering 80% of the accesses. In addition, the internal structure of the data skews the distribution since an aggregated database will contain unequal numbers of articles per journal. An institution that has 1 million accesses on the same database will have an even lower % of titles accounting for 80% of the accesses. John McDonald Acquisitions Librarian California Institute of Technology Bradford's Distribution: From the Classical Bibliometric ''Law'' to the More General Stochastic Models Vesna Oluic-Vukovic JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 48(9):833-842, 1997 Pratt's Measure for some Bibliometric Distributions and its Relation with the 80/20 Rule Leo Egghe JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 38(4):288-297, 1987 ----------2 Date: Thu, 05 Jun 2003 18:21:59 -0400 From: Kent Mulliner <mulliner@ohio.edu> Subject: Re: Bradford distribution, 80/20, and larger samples Without any pretense at statistical sophistication, in a paper I presented at the Charleston Conference last November, I offered statistics on long-term (four years), statewide electronic journal usage data for OhioLINK. For Elsevier titles, that indicated a 70:30 usage ratio. Importantly, a closer analysis of the data indicated that variation in usage was greatest in the first year (whether a learning curve or a kid in a candy shop were competing hypotheses) with correlation coefficients of .29 and .42 for the middle and bottom thirds with usage after four years. Also, variation in usage year to year and over four years was greatest in the middle third of the title distribution. Since this was based on the full Elsevier buffet (the same 1,182 titles), it suggested that there were of course titles on the bottom and to which we had (in most, but not all, cases) not subscribed which could be lost with little impact and similarly the most used list was fairly clear with titles to which we subscribed (or would have if we could afford it). It was exactly in the middle, where we (outside of the big deal) would have liked to keep some and shed some that usage varied most from year to year. This year's cherry would become next year's lemon and the following year's prune. One caveat that I stressed. While the usage extends over four years (nearly an eon in e-journal time), usage continues to grow year after year (by about 200,000 per year) so we are hardly looking at a mature situation. K. Mulliner Collection Development Coordinator Ohio University Libraries PHONE: 740-593-2707 Athens, OH 45701-2978, USA FAX: 740-593-2692 mulliner@ohio.edu