Showing posts with label data. Show all posts
Showing posts with label data. Show all posts

Thursday, 27 July 2023

Information Explosion

 From Wikipedia, the free encyclopedia

The information explosion is the rapid increase in the amount of published information or data and the effects of this abundance.[1] As the amount of available data grows, the problem of managing the information becomes more difficult, which can lead to information overload. The Online Oxford English Dictionary indicates use of the phrase in a March 1964 New Statesman article.[2] The New York Times first used the phrase in its editorial content in an article by Walter Sullivan on June 7, 1964, in which he described the phrase as "much discussed". (p11.) [3] The earliest known use of the phrase was in a speech about television by NBC president Pat Weaver at the Institute of Practitioners of Advertising in London on September 27, 1955. The speech was rebroadcast on radio station WSUI in Iowa and excerpted in the Daily Iowan newspaper two months later.[4]

Many sectors are seeing this rapid increase in the amount of information available such as healthcare, supermarkets, and even governments with birth certificate informations and immunization records.[5] Another sector that is being affected by this phenomenon is journalism. Such a profession, which in the past was responsible for the dissemination of information, may be suppressed by the overabundance of information today.[6]

Techniques to gather knowledge from an overabundance of electronic information (e.g., data fusion may help in data mining) have existed since the 1970s. Another common technique to deal with such amount of information is qualitative research.[7] Such approaches aim to organize the information, synthesizing, categorizing and systematizing in order to be more usable and easier to search.

Growth patterns[edit]

  • The world's technological capacity to store information grew from, optimally compressed, 2.6 exabytes in 1986 to 15.7 in 1993, over 54.5 in 2000, and to 295 exabytes in 2007. [8]
  • The world's technological capacity to receive information through one-way broadcast networks was 432 exabytes of (optimally compressed) information in 1986, 715 (optimally compressed) exabytes in 1993, 1,200 (optimally compressed) exabytes in 2000, and 1,900 in 2007.[8]
  • The world's effective capacity to exchange information through two-way telecommunication networks was 0.281 exabytes of (optimally compressed) information in 1986, 0.471 in 1993, 2.2 in 2000, and 65 (optimally compressed) exabytes in 2007.[8]

A new metric that is being used in an attempt to characterize the growth in person-specific information, is the disk storage per person (DSP), which is measured in megabytes/person (where megabytes is 106 bytes and is abbreviated MB). Global DSP (GDSP) is the total rigid disk drive space (in MB) of new units sold in a year divided by the world population in that year. The GDSP metric is a crude measure of how much disk storage could possibly be used to collect person-specific data on the world population.[5] In 1983, one million fixed drives with an estimated total of 90 terabytes were sold worldwide; 30MB drives had the largest market segment.[9] In 1996, 105 million drives, totaling 160,623 terabytes were sold with 1 and 2 gigabyte drives leading the industry.[10] By the year 2000, with 20GB drive leading the industry, rigid drives sold for the year are projected to total 2,829,288 terabytes Rigid disk drive sales to top $34 billion in 1997.

According to Latanya Sweeney, there are three trends in data gathering today:

Type 1. Expansion of the number of fields being collected, known as the “collect more” trend.

Type 2. Replace an existing aggregate data collection with a person-specific one, known as the “collect specifically” trend.

Type 3. Gather information by starting a new person-specific data collection, known as the “collect it if you can” trend.[5]

Related terms[edit]

Since "information" in electronic media is often used synonymously with "data", the term information explosion is closely related to the concept of data flood (also dubbed data deluge). Sometimes the term information flood is used as well. All of those basically boil down to the ever-increasing amount of electronic data exchanged per time unit. The awareness about non-manageable amounts of data grew along with the advent of ever more powerful data processing since the mid-1960s.[11]

Challenges[edit]

Even though the abundance of information can be beneficial in several levels, some problems may be of concern such as privacy, legal and ethical guidelines, filtering and data accuracy.[12] Filtering refers to finding useful information in the middle of so much data, which relates to the job of data scientists. A typical example of a necessity of data filtering (data mining) is in healthcare since in the next years is due to have EHRs (Electronic Health Records) of patients available. With so much information available, the doctors will need to be able to identify patterns and select important data for the diagnosis of the patient.[12] On the other hand, according to some experts, having so much public data available makes it difficult to provide data that is actually anonymous.[5] Another point to take into account is the legal and ethical guidelines, which relates to who will be the owner of the data and how frequently he/she is obliged to the release this and for how long.[12] With so many sources of data, another problem will be accuracy of such. An untrusted source may be challenged by others, by ordering a new set of data, causing a repetition in the information.[12] According to Edward Huth, another concern is the accessibility and cost of such information.[13] The accessibility rate could be improved by either reducing the costs or increasing the utility of the information. The reduction of costs according to the author, could be done by associations, which should assess which information was relevant and gather it in a more organized fashion.

Web servers[edit]

As of August 2005, there were over 70 million web servers.[14] As of September 2007 there were over 135 million web servers.[15]

Blogs[edit]

According to Technorati, the number of blogs doubles about every 6 months with a total of 35.3 million blogs as of April 2006.[16] This is an example of the early stages of logistic growth, where growth is approximately exponential, since blogs are a recent innovation. As the number of blogs approaches the number of possible producers (humans), saturation occurs, growth declines, and the number of blogs eventually stabilizes.

See also[edit]

References[edit]

  1. ^ Hilbert, M. (2015). Global information Explosion:https://www.youtube.com/watch?v=8-AqzPe_gNs&list=PLtjBSCvWCU3rNm46D3R85efM0hrzjuAIg. Digital Technology and Social Change [Open Online Course at the University of California] freely available at: https://canvas.instructure.com/courses/949415
  2. ^ “Information.” http://dictionary.oed.com. accessed January 4, 2008
  3. ^ "U. S. WILL REMOVE REACTOR IN ARCTIC; Compacting Snow Squeezes Device Under Ice Sheet"The New York Times. 7 June 1964.
  4. ^ Weaver, Sylvester (22 Nov 1955). "The Impact of TV in the U.S." Daily Iowan. p. 2. Retrieved 18 Aug 2021I believe that in the last few years we have set in motion an information explosion. To each man there is flooding more information than he can presently handle, but he is learning how to handle it and, as he learns, it will do him good.
  5. Jump up to:a b c d Sweeney, Latanya. "Information explosion." Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies (2001): 43-74.
  6. ^ Fuller, Jack. What is happening to news: The information explosion and the crisis in journalism. University of Chicago Press, 2010.
  7. ^ Major, Claire Howell, and Maggi Savin-Baden. An introduction to qualitative research synthesis: Managing the information explosion in social science research. Routledge, 2010.
  8. Jump up to:a b c "The Womartinhilbert.net/WorldInfoCapacity.html "free access to the study" and "video animation".
  9. ^ Disk/Trend report 1983,” Computer Week. Mountain View, CA. (46) 11/11/83.
  10. ^ Rigid disk drive sales to top $34 billion in 1997,” Disk/Trend News. Mountain View, CA: Disk/Trend, Inc., 1997.
  11. ^ Google Books Ngram viewer for the terms mentioned here
  12. Jump up to:a b c d Berner, Eta S., and Jacqueline Moss. "Informatics challenges for the impending patient information explosion." Journal of the American Medical Informatics Association 12.6 (2005): 614-617.
  13. ^ Huth, Edward J. "The information explosion." Bulletin of the New York Academy of Medicine 65.6 (1989): 647.
  14. ^ Robert H Zakon (15 December 2010). "Hobbes' Internet Timeline 10.1". zakon.org. Retrieved 27 August 2011.
  15. ^ "August 2011 Web Server Survey". netcraft.com. August 2011. Retrieved 27 August 2011.
  16. ^ "State of the Blogosphere, April 2006 Part 1: On Blogosphere Growth". Sifry's Alerts (sifry.com). April 17, 2006. Archived from the original on 9 January 2013. Retrieved 27 August 2011.

External links[edit]

Thursday, 16 June 2022

Data

 

Data

From Wikipedia, the free encyclopedia
Jump to navigationJump to search
These are some of the different types of data.

Data (US/ˈdætə/UK/ˈdtə/) are individual factsstatistics, or items of information, often numeric.[1] In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects,[1] while a datum (singular of data) is a single value of a single variable.[2]

Although the terms "data" and "information" are often used interchangeably, these terms have distinct meanings. In some popular publications, data are sometimes said to be transformed into information when they are viewed in context or in post-analysis.[3] However, in academic treatments of the subject data are simply units of information. Data are used in scientific research, businesses management (e.g., sales data, revenue, profits, stock price), finance, governance (e.g., crime ratesunemployment ratesliteracy rates), and in virtually every other form of human organizational activity (e.g., censuses of the number of homeless people by non-profit organizations).

In general, data are atoms of decision making: they are the smallest units of factual information that can be used as a basis for reasoning, discussion, or calculation. Data can range from abstract ideas to concrete measurements, even statistics. Data are measuredcollected, reported, and analyzed, and used to create data visualizations such as graphs, tables or images. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processingRaw data ("unprocessed data") is a collection of numbers or characters before it has been "cleaned" and corrected by researchers. Raw data needs to be corrected to remove outliers or obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic location recording a tropical temperature). Data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next stage. Field data is raw data that is collected in an uncontrolled "in situ" environment. Experimental data is data that is generated within the context of a scientific investigation by observation and recording.

Data has been described as "the new oil of the digital economy".[4][5]

Etymology and terminology[edit]

The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954.[6]

The Latin word data is the plural of 'datum', "(thing) given," neuter past participle of dare "to give".[6] In English the word data may be used as a plural noun in this sense, with some writers—usually, those working in natural sciences, life sciences, and social sciences—using datum in the singular and data for plural, especially in the 20th century and in many cases also the 21st (for example, APA style as of the 7th edition still requires "data" to be plural.[7]). However, in everyday language and much of the usage of software development and computer science, "data" is most commonly used in the singular as a mass noun (like "sand" or "rain"). The term big data takes the singular.

Meaning[edit]

Adrien Auzout's "A TABLE of the Apertures of Object-Glasses" from a 1665 article in Philosophical Transactions

Data, informationknowledge, and wisdom are closely related concepts, but each has its role concerning the other, and each term has its meaning. According to a common view, data are collected and analyzed; data only becomes information suitable for making decisions once it has been analyzed in some fashion.[8] One can say that the extent to which a set of data is informative to someone depends on the extent to which it is unexpected by that person. The amount of information contained in a data stream may be characterized by its Shannon entropy.

Knowledge is the understanding based on extensive experience dealing with information on a subject. For example, the height of Mount Everest is generally considered data. The height can be measured precisely with an altimeter and entered into a database. This data may be included in a book along with other data on Mount Everest to describe the mountain in a manner useful for those who wish to decide on the best method to climb it. An understanding based on experience climbing mountains that could advise persons on the way to reach Mount Everest's peak may be seen as "knowledge". The practical climbing of Mount Everest's peak based on this knowledge may be seen as "wisdom". In other words, wisdom refers to the practical application of a person's knowledge in those circumstances where good may result. Thus wisdom complements and completes the series "data", "information" and "knowledge" of increasingly abstract concepts.

Data are often assumed to be the least abstract concept, information the next least, and knowledge the most abstract.[9] In this view, data becomes information by interpretation; e.g., the height of Mount Everest is generally considered "data", a book on Mount Everest geological characteristics may be considered "information", and a climber's guidebook containing practical information on the best way to reach Mount Everest's peak may be considered "knowledge". "Information" bears a diversity of meanings that ranges from everyday usage to technical use. This view, however, has also been argued to reverse how data emerges from information, and information from knowledge.[10] Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation. Beynon-Davies uses the concept of a sign to differentiate between data and information; data are a series of symbols, while information occurs when the symbols are used to refer to something.[11][12]

Before the development of computing devices and machines, people had to manually collect data and impose patterns on it. Since the development of computing devices and machines, these devices can also collect data. In the 2010s, computers are widely used in many fields to collect data and sort or process it, in disciplines ranging from marketing, analysis of social services usage by citizens to scientific research. These patterns in data are seen as information that can be used to enhance knowledge. These patterns may be interpreted as "truth" (though "truth" can be a subjective concept) and may be authorized as aesthetic and ethical criteria in some disciplines or cultures. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between the mark and observation is broken.[13]

Mechanical computing devices are classified according to how they represent data. An analog computer represents a datum as a voltage, distance, position, or other physical quantity. A digital computer represents a piece of data as a sequence of symbols drawn from a fixed alphabet. The most common digital computers use a binary alphabet, that is, an alphabet of two characters typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet. Some special forms of data are distinguished. A computer program is a collection of data, which can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish metadata, that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.

Data documents[edit]

Whenever data needs to be registered, data exists in the form of a data documents. Kinds of data documents include:

Some of these data documents (data repositories, data studies, data sets, and software) are indexed in Data Citation Indexes, while data papers are indexed in traditional bibliographic databases, e.g., Science Citation Index. See further.[14]

Data collection[edit]

Gathering data can be accomplished through a primary source (the researcher is the first person to obtain the data) or a secondary source (the researcher obtains the data that has already been collected by other sources, such as data disseminated in a scientific journal). Data analysis methodologies vary and include data triangulation and data percolation.[15] The latter offers an articulate method of collecting, classifying, and analyzing data using five possible angles of analysis (at least three) to maximize the research's objectivity and permit an understanding of the phenomena under investigation as complete as possible: qualitative and quantitative methods, literature reviews (including scholarly articles), interviews with experts, and computer simulation. The data are thereafter "percolated" using a series of pre-determined steps so as to extract the most relevant information.

In other fields[edit]

Although data are also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with the ethos of data as "given". Peter Checkland introduced the term capta (from the Latin capere, “to take”) to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented.[16] Johanna Drucker has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using data may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent.[17] The term capta, which emphasizes the act of observation as constitutive, is offered as an alternative to data for visual representations in the humanities.

See also[edit]

References[edit]

  1. Jump up to:a b OECD Glossary of Statistical Terms. OECD. 2008. p. 119. ISBN 978-92-64-025561.
  2. ^ "Statistical Language - What are Data?"Australian Bureau of Statistics. 2013-07-13. Archived from the original on 2019-04-19. Retrieved 2020-03-09.
  3. ^ "Data vs Information - Difference and Comparison | Diffen"www.diffen.com. Retrieved 2018-12-11.
  4. ^ Yonego, Joris Toonders (July 23, 2014). "Data Is the New Oil of the Digital Economy"Wired – via www.wired.com.
  5. ^ "Data is the new oil". July 16, 2018. Archived from the original on 2021-10-27.
  6. Jump up to:a b "data | Origin and meaning of data by Online Etymology Dictionary"www.etymonline.com.
  7. ^ American Psychological Association (2020). "6.11". Publication Manual of the American Psychological Association: the official guide to APA style. American Psychological Association. ISBN 9781433832161.
  8. ^ "Joint Publication 2-0, Joint Intelligence" (PDF)Joint Chiefs of Staff, Joint Doctrine Publications. Department of Defense. 23 October 2013. pp. I-1. Retrieved July 17, 2018.
  9. ^ Akash Mitra (2011). "Classifying data for successful modeling". Archived from the original on 2017-11-07. Retrieved 2017-11-05.
  10. ^ Tuomi, Ilkka (2000). "Data is more than knowledge". Journal of Management Information Systems6 (3): 103–117. doi:10.1080/07421222.1999.11518258.
  11. ^ P. Beynon-Davies (2002). Information Systems: An introduction to informatics in organisations. Basingstoke, UK: Palgrave MacmillanISBN 0-333-96390-3.
  12. ^ P. Beynon-Davies (2009). Business information systems. Basingstoke, UK: Palgrave. ISBN 978-0-230-20368-6.
  13. ^ Sharon Daniel. The Database: An Aesthetics of Dignity.
  14. ^ Schöpfel et al. 2020. "Data Documents". ISKO Encyclopedia of Knowledge Organizationhttps://www.isko.org/cyclo/data_documents
  15. ^ Mesly, Olivier (2015). Creating Models in Psychological Research. États-Unis : Springer Psychology  : 126 pages. ISBN 978-3-319-15752-8
  16. ^ P. Checkland and S. Holwell (1998). Information, Systems, and Information Systems: Making Sense of the Field. Chichester, West Sussex: John Wiley & Sons. pp. 86–89. ISBN 0-471-95820-4.
  17. ^ Johanna Drucker (2011). "Humanities Approaches to Graphical Display"Digital Humanities Quarterly005 (1).

External links[edit]

Welcome to Precision Universal Debate

                                                   IMPORTANT  Since the original article below was written there has been much interest in t...