| How Much Information? 2003 | |||
| Summary | Stored Information | Information Flows | Wrap-up |
| Thanks | Printable (PDF) | |||
|
Executive Summary
|
| 878
KB, 103 pages |
108 KB, 14 pages |
How much new information is created each year? Newly created information is stored in four physical media – print, film, magnetic and optical – and seen or heard in four information flows through electronic channels – telephone, radio and TV, and the Internet. This study of information storage and flows analyzes the year 2002 in order to estimate the annual size of the stock of new information recorded in storage media, and heard or seen each year in information flows. Where reliable data was available we have compared the 2002 findings to those of our 2000 study (which used 1999 data) in order to describe a few trends in the growth rate of information.
In 2000 we conducted a study to estimate how much information is produced every year (see http://www.sims.berkeley.edu/research/projects/how-much-info/). We then estimated that in 1999 the world produced between 1 and 2 exabytes of unique information. In Summer 2003 we repeated the study, using 2002 data, in order to begin to identify trends in the production and consumption of information. Some of the 1999 data has been revised in this study because new information sources were identified; our revised estimate is that in 1999 the world produced between 2 and 3 exabytes of new information.
As in 1999, we have estimated the magnitudes of information flows (TV, Radio, Telephone, Internet) that are currently not systematically archived, but may well be in the future. This year we have added two studies of the Internet. We have sampled the World Wide Web, to determine the size of the surface web and to define the source, functions and content of Web pages. And we have studied desktop disk drives, to determine how people consume information on the Internet.
Because information is created and distributed in different media or formats there is no common standard with which to measure the amount of information created each year, thus we have translated the vast array of information formats and media to a single standard – terabytes. Terabytes used as a common standard of measurement of the amount of new information is particularly useful given that most new information is in digital form, and other formats are increasingly giving way to digital form (i.e., digital images replacing film based photographs), or are archived in digital form (i.e., print newspapers also published on the Web).
However, this methodology measures only the volume of information, not the quality of information in a given format or its utility for different purposes, i.e., the relative value of information in an edited book or peer-eviewed journal articles when compared to digital storage of raw data.
Table 1.1: How Big is an Exabyte? |
|
Kilobyte (KB) |
1,000 bytes OR 103bytes |
Megabyte (MB) |
1,000,000 bytes OR 106 bytes |
Gigabyte (GB) |
1,000,000,000 bytes OR 109 bytes |
Terabyte (TB) |
1,000,000,000,000 bytes OR 1012 bytes |
Petabyte (PB) |
1,000,000,000,000,000 bytes OR 1015 bytes |
Exabyte (EB) |
1,000,000,000,000,000,000 bytes OR 1018 bytes |
Source: Many of these examples were taken from Roy Williams “Data Powers of Ten” web page at Caltech.
After production of original content by media type was estimated, the key problem was to identify a common standard of comparison. We have translated the volume of original content into a common standard by figuring the size of analog content in terabytes if it were to be digitized using industry standard practices (‘upper bound’ estimates). We have then determined how much storage each type would take using industry standards for compression, and defined working assumptions to adjust for duplication of content (‘lower bound’ estimates). See the Qualifications section for some of the problems of this methodology.
Information is recorded, stored and distributed in four physical media – paper, film, magnetic, and optical. Good data is available for the worldwide production of each storage medium, providing an upper bound for the potential production of original information and copies. There are often good estimates for how much original content is produced in each of these different storage formats, particularly for the advanced economies that produce the most information. Where those data don’t exist we have adopted working assumptions to make our estimates; these assumptions are documented in the full report and, as in 2000, we welcome suggestions for improving them.
Table 1.1 summarizes yearly worldwide production of original stored content “circa 2002,” because Paper and Film statistics are largely from 2001 while Magnetic and Optical are largely from 2002. Detailed source information and the inferences that were used to produce these calculations are presented in detail in the web pages on Paper, Film, Magnetic, and Optical accessible from the links at the top of this page.
|
Table 1.2: Worldwide production of original information, if stored digitally, in terabytes circa 2002. Upper estimates assume information is digitally scanned, lower estimates assume digital content has been compressed. |
|||||
|
Storage Medium |
2002 Terabytes Upper Estimate |
2002 Terabytes Lower Estimate |
1999-2000 Upper Estimate |
1999-2000 Lower Estimate |
% Change Upper Estimates |
|
Paper |
1,634 |
327 |
1,200 |
240 |
36% |
|
Film |
420,254 |
76,69 |
431,690 |
58,209 |
-3% |
|
Magnetic |
5187130 |
3,416,230 |
2,779,760 |
2,073,760 |
87% |
|
Optical |
103 |
51 |
81 |
29 |
28% |
|
TOTAL: |
5,609,121 |
3,416,281 |
3,212,731 |
2,132,238 |
74.5% |
Source: How much information 2003
Summary estimates show that the storage of new information has been growing at a rate of over 30% a year (upper estimate, uncompressed). There has been dramatic growth in storage of new information over the past two years in every storage medium except film. Film-based content – especially photographs – is migrating to digital media, both optical and magnetic.
A tree can produce about 80,500 sheets of paper, thus it requires about 786 million trees to produce the world's annual paper supply. The UNESCO Statistical Handbook for 1999 estimates that paper production provides 1,510 sheets of paper per inhabitant of the world on average. But paper consumption is not equal; annually each of the inhabitants of North America consumes 11,916 sheets of paper (24 reams), and inhabitants of the European Union consume 7,280 sheets of paper (15 reams). At least half of this paper is used in printers and copiers to produce office documents.
|
Table 1.3: Worldwide production of printed original content, if stored digitally in terabytes circa 2002. Upper estimate is scanned; lower estimate is compressed. |
||||||
|
Storage Medium |
Type of Content |
Terabytes/Yr Upper Estimate |
Terabytes/Yr Lower Estimate |
1999 Upper Estimate |
1999 Lower Estimate |
% Change Upper Estimates |
|
Paper |
Books |
39 |
8 |
39 |
8 |
0 |
|
Newspapers |
138.4 |
27.7 |
124 |
25 |
12% |
|
|
Office Documents |
1,397.5 |
279.5 |
975 |
195 |
43% |
|
|
Mass market periodicals |
52 |
10 |
52 |
10 |
0 |
|
|
Journals |
6 |
1.3 |
9 |
2 |
-33% |
|
|
Newsletters |
0.9 |
0.2 |
0.8 |
0.2 |
0 |
|
|
Subtotal |
1,633.8 |
326.7 |
1,199.8 |
240.2 |
36% |
|
Source: How much information 2003, Table 2.3
The amount of new original information stored on paper increased 36% between 1999 and 2002.
For details on this data, our sources and calculations see Paper.
|
Table 1.4: United States production of printed original content, if stored digitally in terabytes circa 2002. Upper estimate is scanned; lower estimate is compressed. |
||||
|
Storage Medium |
Type of Content |
Terabytes/Yr |
1999 |
%
Change |
|
Paper |
Books |
5.5 |
3 |
83% |
|
Newspapers |
13.5 |
13 |
4% |
|
|
Office Documents |
559 |
390 |
43% |
|
|
Mass market periodicals |
3.5 |
13 |
-73% |
|
|
Journals |
1.6 |
2 |
-20% |
|
|
Newsletters |
0.3 |
0.2 |
50% |
|
U.S. Mail |
6,230 |
5,940 |
4.8% |
|
|
Subtotal |
6,813 |
6,361 |
7.1% |
|
Source: How much information 2003, Table 2.5
The U.S. produces 35% of the world’s new printed information each year and 40% of the world's card and letter postal volume. About half of all postal mail in the United States is currently first class and about half is junk mail. If we assume 2 pages per piece of mail, digitized at 15 kilobytes per page, 2002 U.S. mail was about 6.23 petabytes per year. This represents an increase of about one-half of a petabyte over 1999 estimates.
Film is a storage medium for analog images that is evolving towards digital images stored on magnetic and optical media.
For details on this data, our sources and calculations see Film.
|
Table 1.5: Worldwide production of filmed original content, if stored digitally, in terabytes circa 2002. |
||||||
|
Storage Medium |
Type of Content |
Terabytes/Yr Upper Estimate |
Terabytes/Yr Lower Estimate |
1999 Upper Estimate |
1999 Lower Estimate |
% Change Upper Estimates |
|
Film |
Photographs |
375,000 |
37,500 |
410,000 |
41,000 |
-9% |
|
Cinema |
6,078 |
12 |
4,490 |
9 |
35% |
|
|
Made for TV films |
2,531 |
2,530 |
N/A |
N/A |
N/A |
|
|
TV series |
14,155 |
14,155 |
N/A |
N/A |
N/A |
|
Direct to video |
2,490 |
2,490 |
N/A |
N/A |
N/A |
|
|
X-Rays |
20,000 |
20,000 |
17,200 |
17,200 |
16% |
|
|
Subtotal |
420,254 |
74,202 |
431,690 |
58,209 |
-2.6% |
|
Source: How much information 2003, Table 3.2
Worldwide production of new information recorded on magnetic storage media has grown 87% since 1999.
For details on this data, our sources and calculations, see Magnetic.
|
Table 1.6: Worldwide production of magnetic original content, if stored digitally using standard compression methods, in terabytes circa 2002. |
||||||
|
Storage Medium |
Type of Content |
Terabytes/Yr Upper Estimate |
Terabytes/Yr Lower Estimate |
1999 Report Upper Estimate |
1999 Report Lower Estimate |
% Change Upper Estimates |
|
Magnetic |
Videotape |
1,340,000 |
1,340,000 |
1,420,000 |
1,420,000 |
-6% |
|
Audiotape |
128,800 |
128,800 |
182,000 |
182,000 |
-30% |
|
|
Digital tape |
250,000 |
250,000 |
250,000 |
250,000 |
0 |
|
|
MiniDV |
1,265,000 |
1,265,000 |
N/A |
N/A |
N/A |
|
|
Floppy disc |
80 |
80 |
70 |
70 |
14% |
|
|
Zip |
350 |
350 |
1,690 |
1,690 |
-79% |
|
|
Audio MD |
17,000 |
17,000 |
N/A |
N/A |
N/A |
|
|
Flash |
12,000 |
12,000 |
N/A |
N/A |
N/A |
|
|
Hard Disk |
1,986,000 |
403,000 |
926,000 |
220,000 |
114% |
|
|
TOTAL |
4,999,230 |
3,416,230 |
2,779,760 |
2,073,760 |
80% |
|
Source: How much information 2003
Optical storage media are the medium of choice for the distribution of software, data, cinema and music -- although a small proportion of digital information overall.
For details on this data, our sources and calculations see Optical.
|
Table 1.7: Worldwide production of optical original content, if stored digitally using standard compression methods, in terabytes circa 2002. |
||||||
|
Storage Medium |
Genre |
Terabytes/Yr Upper Estimate |
Terabytes/Yr Lower Estimate |
1999 Upper Estimate |
1999 Lower Estimate |
% Change Upper Estimates |
|
Optical |
Audio CD |
58 |
6 |
58 |
6 |
0 |
|
CD ROM |
1.1 |
1.1 |
0.7 |
0.7 |
57% |
|
|
DVD |
43.8 |
43.8 |
22 |
22 |
99% |
|
|
Subtotal |
102.9 |
50.9 |
80.7 |
28.7 |
28% |
|
Source: How much information 2003, Table 5.2
The U.S. produces 37% of the world’s audio CD titles, 50% of the CD ROM titles, and 40% of the DVD titles.
|
Table 1.8: United States production of optical original content, if stored digitally using standard compression methods, in terabytes circa 2002. |
||||||
|
Storage Medium |
Genre |
Terabytes/Yr |
Terabytes/Yr Lower Estimate |
1999 |
1999 Lower Estimate |
% Change |
|
Optical |
Audio CD |
22 |
2 |
22 |
2 |
0 |
|
CD ROM |
0.55 |
0.06 |
1 |
0.3 |
-45% |
|
|
DVD |
18 |
18 |
13 |
13 |
38% |
|
|
Subtotal |
40.55 |
20.06 |
36 |
15.3 |
13% |
|
Source: How much information 2003, Table 5.2
Communication flows through four electronic channels: radio and television broadcasting, telephone calls, and the Internet. Each channel requires access to a form of information technology: radios, television sets, telephones, and computers. Thus like storage media, information flows are distributed unequally around the world.
Information stored on paper, film, optical, and magnetic media totals about 5 exabytes of new information each year; this is less than one third of the new information that is communicated through electronic information flows – telephone, radio and TV, and the Internet – which is about 17.7 exabytes.
|
Table 1.9: Summary of electronic flows of new information in 2002 in terabytes. |
|
|
Medium |
2002 Terabytes |
|
Radio |
3,488 |
|
Television |
68,955 |
|
Telephone |
17,300,000 |
Internet |
532,897 |
|
TOTAL |
17,905,340 |
Source: How much information 2003
The striking finding here is that most of the total volume of new information flows is derived from the volume of voice telephone traffic, most of which is unique content. The second largest component of information flows is the Internet.
World radio stations produce 320 million hours of radio broadcasting, which would require 16,000 terabytes to store; we estimate 70 million hours are original programming, which would require an annual storage requirement of about 3,500 terabytes. World television stations produce about 123 million hours total programming; we estimate about 31 million hours are original programming, requiring about 70,000 terabytes of storage.
|
Table 1.10: World – annual production of original broadcast media items – 2003 sources |
|||||
|
Media Type |
Number of Stations |
Unique Items per Year |
Conversion Factor |
Total Terabytes (Annual) |
|
|
Lower Bound |
Upper Bound |
||||
|
Radio |
47,776 |
70 million hours of original programming |
0.05 GB/hour |
3,488 |
3,488 |
|
Television |
21,264 |
31 million hours of original programming |
1.3 GB - 2.25 GB hour |
39,841 |
68,955 |
|
Total: |
43,329 |
72,443 |
|||
Source: How much information 2003, Table 6.1
In the United States, there are 13,261 radio stations producing 19.7 million hours of original programming, or about 987 terabytes of original programming. As of 2002 there were 1686 broadcast TV stations in the United States producing about 14.5 million hours of content a year; about 3.6 million hours are original information, equivalent to between 4,700 and 8,200 terabytes (depending upon the compression standard used).
For details on this data, our sources and calculations see Broadcast.
|
Table 1.11: United States – Comparison of production of original broadcast media items – 2003 sources |
|||||
|
Media Type |
Number of Stations |
Unique Items per Year |
Conversion Factor |
Total Terabytes (Annual) |
|
|
Lower Bound |
Upper Bound |
||||
|
Radio stations (2002, FCC) |
|
|
|
|
|
|
Commercial - AM |
4,811 |
7.2 million hours |
0.05 GB/hour |
358 TB |
358 TB |
|
Commercial - FM |
6,147 |
9.2 million hours |
0.05 GB/hour |
458 TB |
458 TB |
|
Educational |
2,303 |
3.4 million hours |
0.05 GB/hour |
171 TB |
171 TB |
Total |
13,261 |
19.8 million hours |
0.05 GB/hour |
987 TB |
987 TB |
|
Television (2002, FCC) |
|
|
|
|
|
|
Broadcast Stations |
1,686 |
3.1 million hours |
1.3 GB - 2.25 GB hour |
4,000 TB |
6,923 TB |
|
Cable Stations |
308 |
.6 million hours |
1.3 GB - 2.25 GB hour |
731 TB |
1,265 TB |
Total |
1,994 |
3.6 million hours |
1.3 GB - 2.25 GB hour |
4,731 TB |
8,188 TB |
|
Total (radio + TV): |
5,718 TB |
9,175 TB |
|||
Source: How much information 2003, Table 6.3
|
Table 1.12: The size of world telephone calls in terabytes. |
|
|
Line calls |
15,000,000 |
|
Wireless calls |
2,300,000 |
|
TOTAL |
17,300,000 |
Source: How much information 2003
There are 1.1 billion main telephone lines in the world as of 2002; it is estimated that each line carries an average of 3,441 minutes a year, or 3,785 billion minutes. At 64 kilobits/second, it would take 15 exabytes to store this much information - most of it original. There are 190 million main telephone lines in the U.S., each of them used over an hour a day for all types of calls (i.e., mostly local, including modems, faxes, etc). It would take 9.25 exabytes of storage to hold all U.S. calls each year. The number of landline phones in the U.S. has dropped by more than 5 million, as mobile phones have grown to 43% of all U.S. phones. Mobile phones used more than 600 billion minutes in 2002, an equivalent of 2.3 exabytes of storage.
For details on this data, our sources and calculations see Telephony.
Although the Internet is the newest medium for information flows, it is the fastest growing new medium of all time, becoming the information medium of first resort for its users. Note that the Web consists of the surface web (fixed web pages) and what Bright Planet calls the deep web (the database driven websites that create web pages on demand).
|
Table 1.13: The size of the Internet in terabytes. |
|
|
Medium |
2002 Terabytes |
|
Surface Web |
167 |
|
Deep Web |
91,850 |
|
Email (originals) |
440,606 |
Instant messaging |
274 |
|
TOTAL |
532,897 |
Source: How much information 2003
Around the world about 600 million people have access to the Internet, about 30% of them in North America.
|
Table 1.14: World Distribution of Internet Users (in millions) |
|
|
Africa |
6.31 |
|
Asia Pacific |
187.24 |
|
Europe |
190.91 |
Middle East |
5.12 |
Canada and USA |
182.67 |
Latin America |
33.35 |
Source: Nielsen / NetRatings via CyberAtlas
According to Nielsen/NetRatings, the average Internet user spends 11 hours and 24 minutes online per month; the average user in the United States spends more than twice that amount of time online: 25 hours and 25 minutes at home and 74 hours and 26 minutes at work. In the United States, Internet access is used to send email (52%), get news (32%), use a search engine to find information (29%), surf the web (23%), do research for work (19%), check the weather (17%) or send an instant message (14%) (Source: Pew Internet and American Life Project).
For details on this data, our sources and calculations see Internet.
In 2000 we estimated the volume of information on the public Web at 20 to 50 terabytes; in 2003 we measured the volume of information on the Web at 167 terabytes - at least triple the amount of information. The surface web is about 167 terabytes as of Summer 2003; BrightPlanet estimates the deep web to be 400 to 450 times larger, thus between 66,800 and 91,850 terabytes.
About 31 billion emails are sent daily, on the Internet and elsewhere, a figure which is expected to double by 2006 (source: International Data Corporation (IDC). The average email is about 59 kilobytes in size, thus the annual flow of emails worldwide is 667,585 terabytes.
Nearly 40% of U.S. Internet users at home logged onto one of the instant messaging (IM) networks at least once in May 2002, while 31% of U.S. business Internet users used IM (source: Nielsen/NetRatings).
A significant new source of storing, creating and exchanging media and data on the Internet is through P2P file sharing networks. KaZaA, the most popular of these applications, has recently reached over 230 million downloads worldwide, with an average of 2 million more per week (source: Download.com). Users on KaZaA share almost 5,000 terabytes of information, over 600 million files and have over 3 million users active on average at any given time (source: KaZaA.com). Looking at a sample of 400,000 users over a 24 hour period we found about 9% of users (38,256) sharing files. We found 1,980,426 files consisting of 14754 GB (14.4 TB) of information. Files ranged in size from 1 Byte to 1.97 GB, with a mean size of 7.6 MB. Using this sample, we were able to describe how P2P users consume information.
We have had to make various working assumptions in order to construct these estimates, and some data sources are contradictory or simply not available, thus our estimates are often rough. Here we list some of the most serious methodological qualifications, each of which offers interesting challenges for those who would seek to refine these estimates.
Our documentary research methodology is to estimate yearly U.S. and world production of originals and copies for the most common forms of information media - paper, film, magnetic, and optical. The data supporting these estimates is often difficult to find, or does not exist at all, and key questions often cannot be answered because no data is collected (e.g., about third world information production). Estimates are marked with three question marks [???] to signal our caution about their reliability. For those reasons we have documented our sources in these reports and defined the working assumptions we have made in producing these estimates, hoping that our readers will help us to identify better sources and to improve our working assumptions.
It is very difficult to distinguish "copies" from "original" information. A newspaper, for example, is published on paper, often published on the Web as well, and is generally archived on microfilm. In fact, most printed materials are produced and/or archived magnetically. There is also lot of duplication within each medium: many newspapers reproduce stock prices, wire stories, advertisements and so on. Ideally, we would like to measure the storage required for the unique content in the newspaper, but it is very hard to determine that number. As indicated above, the duplication issue is particularly serious for digital storage, since little of what is stored on individual hard drives is unique. We've tried to adjust for this the best we can, and documented our assumptions in the detailed treatment of each medium.
The advantage of using a single measurement standard such as terabytes to compare the volume of information in different formats is obvious. However, unlike paper or film, there is no unambiguous way to measure the size of digital information. A 600 dot per inch scanned digital image of text can be compressed to about one hundredth of its original size. A DVD version of a movie can be 1000 times smaller than the original digital image. We've made what we thought were sensible choices with respect to compression, steering a middle course between the high estimate (based on ``reasonable'' compression) and the low estimate (based on highly compressed content). It is worth noting that the fact that digital storage can be compressed to different degrees depending on needs is a significant advantage for digital over analog storage.
We view this report as a "living document" and intend to revise it based on comments, corrections, and suggestions. Please send comments to how-much-info@sims.berkeley.edu.
Many thanks to our sponsors. Financial support for this study was provided by:
UC Berkeley's School of Information Management and Systems is the first school in the nation to explicitly address the growing need to manage information more effectively.
With respect to education, we are training a new type of professional: "information managers". Our graduates are familiar with the latest and most powerful techniques for locating, organizing, retrieving, manipulating, protecting, and presenting information. They study not only technology, but also the institutional, legal, economic and organizational factors necessary for creating information systems that meet peoples' needs.
With respect to research, we are examining ways to build more effective tools and systems for managing information. This effort is inherently multidisciplinary, involving computer science, information science, social science, cognitive science, and legal studies.
Release date: October 27, 2003. © 2003 Regents of the University of California