Email list hosting service & mailing list manager


Bradford Distribution... (2 messages) Marcia Tuttle 06 Jun 2003 13:44 UTC

----------1
Date: Thu, 5 Jun 2003 14:53:30 -0700
From: jmcdonald@library.caltech.edu
Subject: Re: Bradford distribution, 80/20, and larger samples

Thanks to Steve for bringing up these intriguing observations.  When
considering the Bradford distribution and the 80/20 rule, it is important to
remember that the shape of the distribution curve is dependent on the
internal structure of the data (Oluic-Vukovic 1997) and as the number of
accesses increases towards infinity the concentration of titles increases as
well (Egghe 1987).  An aggregated database that features 500,000 accesses
and 8000 titles has an extremely high concentration per number of accesses,
resulting in a lower proportion of items covering 80% of the accesses.  In
addition, the internal structure of the data skews the distribution since an
aggregated database will contain unequal numbers of articles per journal.

An institution that has 1 million accesses on the same database will have an
even lower % of titles accounting for 80% of the accesses.

John McDonald
Acquisitions Librarian
California Institute of Technology

Bradford's Distribution: From the Classical Bibliometric ''Law'' to the More
General Stochastic Models
Vesna Oluic-Vukovic
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 48(9):833-842, 1997

Pratt's Measure for some Bibliometric Distributions and its Relation with
the 80/20 Rule
Leo Egghe
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 38(4):288-297, 1987

----------2
Date: Thu, 05 Jun 2003 18:21:59 -0400
From: Kent Mulliner <mulliner@ohio.edu>
Subject: Re: Bradford distribution, 80/20, and larger samples

Without any pretense at statistical sophistication, in a paper I presented at
the Charleston Conference last November, I offered statistics on long-term
(four years), statewide electronic journal usage data for OhioLINK.  For
Elsevier titles, that indicated a 70:30 usage ratio.  Importantly, a closer
analysis of the data indicated that variation in usage was greatest in
the first year (whether a learning curve or a kid in a candy shop were
competing hypotheses) with correlation coefficients of .29 and .42 for
the middle and bottom thirds with usage after four years.  Also,
variation in usage year to year and over four years was greatest
in the middle third of  the title distribution.  Since this was based on the
full Elsevier buffet (the same 1,182 titles), it suggested that there were
of course titles on the bottom and to which we had (in most, but not all,
cases) not subscribed which could be lost with little impact and
similarly the most used list was fairly clear with titles to which we
subscribed (or would have if we could afford it).  It was exactly
in the middle, where we (outside of the big deal) would have liked to
keep some and shed some that usage varied most from year to year.
This year's cherry would become next year's lemon and the following
year's prune.

One caveat that I stressed.  While the usage extends over four years
(nearly an eon in e-journal time), usage continues to grow year after
year (by about 200,000 per year) so we are hardly looking at a mature
situation.

K. Mulliner       Collection Development Coordinator

Ohio University Libraries                       PHONE:  740-593-2707
Athens, OH 45701-2978, USA      FAX:    740-593-2692
mulliner@ohio.edu