♀basic info♀

The official site of ☻☺♀Junrey Ramos♀☺☻.

☻♀basic information about computer♀☻

A computer is a general purpose device that can be programmed to carry out a finite set of arithmetic or logical operations. Since a sequence of operations can be readily changed, the computer can solve more than one kind of problem.

Conventionally, a computer consists of at least one processing element, typically a central processing unit (CPU) and some form of memory. The processing element carries out arithmetic and logic operations, and a sequencing and control unit that can change the order of operations based on stored information. Peripheral devices allow information to be retrieved from an external source, and the result of operations saved and retrieved.

The first electronic digital computers were developed between 1940 and 1945 in the United Kingdom and United States. Originally they were the size of a large room, consuming as much power as several hundred modern personal computers (PCs).[1] In this era mechanical analog computers were used for military applications.

Modern computers based on integrated circuits are millions to billions of times more capable than the early machines, and occupy a fraction of the space.[2] Simple computers are small enough to fit into mobile devices, and mobile computers can be powered by small batteries. Personal computers in their various forms are icons of the Information Age and are what most people think of as "computers". However, the embedded computers found in many devices from mp3 players to fighter aircraft and from toys to industrial robots are the most numerous.

Contents [hide]

1 History of computing
2 Programs
3 Components
4 Misconceptions
- 4.1 Required technology
5 Further topics
6 See also
7 Notes
8 References
9 External links

History of computing The Jacquard loom, on display at the Museum of Science and Industry in Manchester, England, was one of the first programmable devices. Main article: History of computing hardware The first use of the word "computer" was recorded in 1613, referring to a person who carried out calculations, or computations, and the word continued with the same meaning until the middle of the 20th century. From the end of the 19th century the word began to take on its more familiar meaning, a machine that carries out computations.[3]

Limited-function early computers The history of the modern computer begins with two separate technologies, automated calculation and programmability, but no single device can be identified as the earliest computer, partly because of the inconsistent application of that term. A few devices are worth mentioning though, like some mechanical aids to computing, which were very successful and survived for centuries until the advent of the electronic calculator, like the Sumerian abacus, designed around 2500 BC[4] of which a descendant won a speed competition against a modern desk calculating machine in Japan in 1946,[5] the slide rules, invented in the 1620s, which were carried on five Apollo space missions, including to the moon[6] and arguably the astrolabe and the Antikythera mechanism, an ancient astronomical computer built by the Greeks around 80 BC.[7] The Greek mathematician Hero of Alexandria (c. 10–70 AD) built a mechanical theater which performed a play lasting 10 minutes and was operated by a complex system of ropes and drums that might be considered to be a means of deciding which parts of the mechanism performed which actions and when.[8] This is the essence of programmability.

Around the end of the 10th century, the French monk Gerbert d'Aurillac brought back from Spain the drawings of a machine invented by the Moors that answered either Yes or No to the questions it was asked.[9] Again in the 13th century, the monks Albertus Magnus and Roger Bacon built talking androids without any further development (Albertus Magnus complained that he had wasted forty years of his life when Thomas Aquinas, terrified by his machine, destroyed it).[10]

In 1642, the Renaissance saw the invention of the mechanical calculator,[11] a device that could perform all four arithmetic operations without relying on human intelligence.[12] The mechanical calculator was at the root of the development of computers in two separate ways. Initially, it was in trying to develop more powerful and more flexible calculators[13] that the computer was first theorized by Charles Babbage [14][15] and then developed.[16] Secondly, development of a low-cost electronic calculator, successor to the mechanical calculator, resulted in the development by Intel [17] of the first commercially available microprocessor integrated circuit.

First general-purpose computers In 1801, Joseph Marie Jacquard made an improvement to the textile loom by introducing a series of punched paper cards as a template which allowed his loom to weave intricate patterns automatically. The resulting Jacquard loom was an important step in the development of computers because the use of punched cards to define woven patterns can be viewed as an early, albeit limited, form of programmability.

The Most Famous Image in the Early History of Computing[18]
This portrait of Jacquard was woven in silk on a Jacquard loom and required 24,000 punched cards to create (1839). It was only produced to order. Charles Babbage owned one of these portraits ; it inspired him in using perforated cards in his analytical engine [19] The Zuse Z3, 1941, considered the world's first working programmable, fully automatic computing machine. It was the fusion of automatic calculation with programmability that produced the first recognizable computers. In 1837, Charles Babbage was the first to conceptualize and design a fully programmable mechanical computer, his analytical engine.[20] Limited finances and Babbage's inability to resist tinkering with the design meant that the device was never completed—nevertheless his son, Henry Babbage, completed a simplified version of the analytical engine's computing unit (the mill) in 1888. He gave a successful demonstration of its use in computing tables in 1906. This machine was given to the Science museum in South Kensington in 1910.

In the late 1880s, Herman Hollerith invented the recording of data on a machine-readable medium. Earlier uses of machine-readable media had been for control, not data. "After some initial trials with paper tape, he settled on punched cards ..."[21] To process these punched cards he invented the tabulator, and the keypunch machines. These three inventions were the foundation of the modern information processing industry. Large-scale automated data processing of punched cards was performed for the 1890 United States Census by Hollerith's company, which later became the core of IBM. By the end of the 19th century a number of ideas and technologies, that would later prove useful in the realization of practical computers, had begun to appear: Boolean algebra, the vacuum tube (thermionic valve), punched cards and tape, and the teleprinter.

During the first half of the 20th century, many scientific computing needs were met by increasingly sophisticated analog computers, which used a direct mechanical or electrical model of the problem as a basis for computation. However, these were not programmable and generally lacked the versatility and accuracy of modern digital computers.

Alan Turing is widely regarded as the father of modern computer science. In 1936 Turing provided an influential formalisation of the concept of the algorithm and computation with the Turing machine, providing a blueprint for the electronic digital computer.[22] Of his role in the creation of the modern computer, Time magazine in naming Turing one of the 100 most influential people of the 20th century, states: "The fact remains that everyone who taps at a keyboard, opening a spreadsheet or a word-processing program, is working on an incarnation of a Turing machine".[22]

The ENIAC, which became operational in 1946, is considered to be the first general-purpose electronic computer. EDSAC was one of the first computers to implement the stored-program (von Neumann) architecture. The Atanasoff–Berry Computer (ABC) was the world's first electronic digital computer, albeit not programmable.[23] Atanasoff is considered to be one of the fathers of the computer.[24] Conceived in 1937 by Iowa State College physics professor John Atanasoff, and built with the assistance of graduate student Clifford Berry,[25] the machine was not programmable, being designed only to solve systems of linear equations. The computer did employ parallel computation. A 1973 court ruling in a patent dispute found that the patent for the 1946 ENIAC computer derived from the Atanasoff–Berry Computer.

The first program-controlled computer was invented by Konrad Zuse, who built the Z3, an electromechanical computing machine, in 1941.[26] The first programmable electronic computer was the Colossus, built in 1943 by Tommy Flowers.

George Stibitz is internationally recognized as a father of the modern digital computer. While working at Bell Labs in November 1937, Stibitz invented and built a relay-based calculator he dubbed the "Model K" (for "kitchen table", on which he had assembled it), which was the first to use binary circuits to perform an arithmetic operation. Later models added greater sophistication including complex arithmetic and programmability.[27]

A succession of steadily more powerful and flexible computing devices were constructed in the 1930s and 1940s, gradually adding the key features that are seen in modern computers. The use of digital electronics (largely invented by Claude Shannon in 1937) and more flexible programmability were vitally important steps, but defining one point along this road as "the first digital electronic computer" is difficult.Shannon 1940 Notable achievements include:

Konrad Zuse's electromechanical "Z machines". The Z3 (1941) was the first working machine featuring binary arithmetic, including floating point arithmetic and a measure of programmability. In 1998 the Z3 was proved to be Turing complete, therefore being the world's first operational computer.[28]
The non-programmable Atanasoff–Berry Computer (commenced in 1937, completed in 1941) which used vacuum tube based computation, binary numbers, and regenerative capacitor memory. The use of regenerative memory allowed it to be much more compact than its peers (being approximately the size of a large desk or workbench), since intermediate results could be stored and then fed back into the same set of computation elements.
The secret British Colossus computers (1943),[29] which had limited programmability but demonstrated that a device using thousands of tubes could be reasonably reliable and electronically reprogrammable. It was used for breaking German wartime codes.
The Harvard Mark I (1944), a large-scale electromechanical computer with limited programmability.[30]
The U.S. Army's Ballistic Research Laboratory ENIAC (1946), which used decimal arithmetic and is sometimes called the first general purpose electronic computer (since Konrad Zuse's Z3 of 1941 used electromagnets instead of electronics). Initially, however, ENIAC had an inflexible architecture which essentially required rewiring to change its programming.

Stored-program architecture This section does not cite any references or sources. (July 2012) Several developers of ENIAC, recognizing its flaws, came up with a far more flexible and elegant design, which came to be known as the "stored-program architecture" or von Neumann architecture. This design was first formally described by John von Neumann in the paper First Draft of a Report on the EDVAC, distributed in 1945. A number of projects to develop computers based on the stored-program architecture commenced around this time, the first of which was completed in 1948 at the University of Manchester in England, the Manchester Small-Scale Experimental Machine (SSEM or "Baby"). The Electronic Delay Storage Automatic Calculator (EDSAC), completed a year after the SSEM at Cambridge University, was the first practical, non-experimental implementation of the stored-program design and was put to use immediately for research work at the university. Shortly thereafter, the machine originally described by von Neumann's paper--EDVAC—was completed but did not see full-time use for an additional two years.

Nearly all modern computers implement some form of the stored-program architecture, making it the single trait by which the word "computer" is now defined. While the technologies used in computers have changed dramatically since the first electronic, general-purpose computers of the 1940s, most still use the von Neumann architecture.

Die of an Intel 80486DX2 microprocessor (actual size: 12×6.75 mm) in its packaging Beginning in the 1950s, Soviet scientists Sergei Sobolev and Nikolay Brusentsov conducted research on ternary computers, devices that operated on a base three numbering system of −1, 0, and 1 rather than the conventional binary numbering system upon which most computers are based. They designed the Setun, a functional ternary computer, at Moscow State University. The device was put into limited production in the Soviet Union, but supplanted by the more common binary architecture.

Semiconductors and microprocessors Computers using vacuum tubes as their electronic elements were in use throughout the 1950s, but by the 1960s they had been largely replaced by transistor-based machines, which were smaller, faster, cheaper to produce, required less power, and were more reliable. The first transistorised computer was demonstrated at the University of Manchester in 1953.[31] In the 1970s, integrated circuit technology and the subsequent creation of microprocessors, such as the Intel 4004, further decreased size and cost and further increased speed and reliability of computers. By the late 1970s, many products such as video recorders contained dedicated computers called microcontrollers, and they started to appear as a replacement to mechanical controls in domestic appliances such as washing machines. The 1980s witnessed home computers and the now ubiquitous personal computer. With the evolution of the Internet, personal computers are becoming as common as the television and the telephone in the household[citation needed].

Modern smartphones are fully programmable computers in their own right, and as of 2009 may well be the most common form of such computers in existence[citation needed].

Programs Alan Turing was an influential computer scientist. The defining feature of modern computers which distinguishes them from all other machines is that they can be programmed. That is to say that some type of instructions (the program) can be given to the computer, and it will process them. While some computers may have strange concepts for "instructions" and "output" (see quantum computing), modern computers based on the von Neumann architecture often have machine code in the form of an imperative programming language.

In practical terms, a computer program may be just a few instructions or extend to many millions of instructions, as do the programs for word processors and web browsers for example. A typical modern computer can execute billions of instructions per second (gigaflops) and rarely makes a mistake over many years of operation. Large computer programs consisting of several million instructions may take teams of programmers years to write, and due to the complexity of the task almost certainly contain errors.

Stored program architecture Main articles: Computer program and Computer programming Replica of the Small-Scale Experimental Machine (SSEM), the world's first stored-program computer, at the Museum of Science and Industry in Manchester, England This section applies to most common RAM machine-based computers.

In most cases, computer instructions are simple: add one number to another, move some data from one location to another, send a message to some external device, etc. These instructions are read from the computer's memory and are generally carried out (executed) in the order they were given. However, there are usually specialized instructions to tell the computer to jump ahead or backwards to some other place in the program and to carry on executing from there. These are called "jump" instructions (or branches). Furthermore, jump instructions may be made to happen conditionally so that different sequences of instructions may be used depending on the result of some previous calculation or some external event. Many computers directly support subroutines by providing a type of jump that "remembers" the location it jumped from and another instruction to return to the instruction following that jump instruction.

Program execution might be likened to reading a book. While a person will normally read each word and line in sequence, they may at times jump back to an earlier place in the text or skip sections that are not of interest. Similarly, a computer may sometimes go back and repeat the instructions in some section of the program over and over again until some internal condition is met. This is called the flow of control within the program and it is what allows the computer to perform tasks repeatedly without human intervention.

Comparatively, a person using a pocket calculator can perform a basic arithmetic operation such as adding two numbers with just a few button presses. But to add together all of the numbers from 1 to 1,000 would take thousands of button presses and a lot of time, with a near certainty of making a mistake. On the other hand, a computer may be programmed to do this with just a few simple instructions. For example:

mov No. 0, sum ; set sum to 0 mov No. 1, num ; set num to 1 loop: add num, sum ; add num to sum add No. 1, num ; add 1 to num cmp num, #1000 ; compare num to 1000 ble loop ; if num <= 1000, go back to 'loop' halt ; end of program. stop running

Once told to run this program, the computer will perform the repetitive addition task without further human intervention. It will almost never make a mistake and a modern PC can complete the task in about a millionth of a second.[32]

Bugs Main article: Software bug The actual first computer bug, a moth found trapped on a relay of the Harvard Mark II computer Errors in computer programs are called "bugs". They may be benign and not affect the usefulness of the program, or have only subtle effects. But in some cases they may cause the program or the entire system to "hang" – become unresponsive to input such as mouse clicks or keystrokes – to completely fail, or to crash. Otherwise benign bugs may sometimes be harnessed for malicious intent by an unscrupulous user writing an exploit, code designed to take advantage of a bug and disrupt a computer's proper execution. Bugs are usually not the fault of the computer. Since computers merely execute the instructions they are given, bugs are nearly always the result of programmer error or an oversight made in the program's design.[33]

Rear Admiral Grace Hopper is credited for having first used the term "bugs" in computing after a dead moth was found shorting a relay in the Harvard Mark II computer in September 1947.[34]

Machine code In most computers, individual instructions are stored as machine code with each instruction being given a unique number (its operation code or opcode for short). The command to add two numbers together would have one opcode, the command to multiply them would have a different opcode and so on. The simplest computers are able to perform any of a handful of different instructions; the more complex computers have several hundred to choose from, each with a unique numerical code. Since the computer's memory is able to store numbers, it can also store the instruction codes. This leads to the important fact that entire programs (which are just lists of these instructions) can be represented as lists of numbers and can themselves be manipulated inside the computer in the same way as numeric data. The fundamental concept of storing programs in the computer's memory alongside the data they operate on is the crux of the von Neumann, or stored program, architecture. In some cases, a computer might store some or all of its program in memory that is kept separate from the data it operates on. This is called the Harvard architecture after the Harvard Mark I computer. Modern von Neumann computers display some traits of the Harvard architecture in their designs, such as in CPU caches.

While it is possible to write computer programs as long lists of numbers (machine language) and while this technique was used with many early computers,[35] it is extremely tedious and potentially error-prone to do so in practice, especially for complicated programs. Instead, each basic instruction can be given a short name that is indicative of its function and easy to remember – a mnemonic such as ADD, SUB, MULT or JUMP. These mnemonics are collectively known as a computer's assembly language. Converting programs written in assembly language into something the computer can actually understand (machine language) is usually done by a computer program called an assembler.

A 1970s punched card containing one line from a FORTRAN program. The card reads: "Z(1) = Y + W(1)" and is labelled "PROJ039" for identification purposes. Programming language Main article: Programming language Programming languages provide various ways of specifying programs for computers to run. Unlike natural languages, programming languages are designed to permit no ambiguity and to be concise. They are purely written languages and are often difficult to read aloud. They are generally either translated into machine code by a compiler or an assembler before being run, or translated directly at run time by an interpreter. Sometimes programs are executed by a hybrid method of the two techniques.

Low-level languages Main article: Low-level programming language Machine languages and the assembly languages that represent them (collectively termed low-level programming languages) tend to be unique to a particular type of computer. For instance, an ARM architecture computer (such as may be found in a PDA or a hand-held videogame) cannot understand the machine language of an Intel Pentium or the AMD Athlon 64 computer that might be in a PC.[36]

Higher-level languages Main article: High-level programming language Though considerably easier than in machine language, writing long programs in assembly language is often difficult and is also error prone. Therefore, most practical programs are written in more abstract high-level programming languages that are able to express the needs of the programmer more conveniently (and thereby help reduce programmer error). High level languages are usually "compiled" into machine language (or sometimes into assembly language and then into machine language) using another computer program called a compiler.[37] High level languages are less related to the workings of the target computer than assembly language, and more related to the language and structure of the problem(s) to be solved by the final program. It is therefore often possible to use different compilers to translate the same high level language program into the machine language of many different types of computer. This is part of the means by which software like video games may be made available for different computer architectures such as personal computers and various video game consoles.

Program design This section does not cite any references or sources. (July 2012) Program design of small programs is relatively simple and involves the analysis of the problem, collection of inputs, using the programming constructs within languages, devising or using established procedures and algorithms, providing data for output devices and solutions to the problem as applicable. As problems become larger and more complex, features such as subprograms, modules, formal documentation, and new paradigms such as object-oriented programming are encountered. Large programs involving thousands of line of code and more require formal software methodologies. The task of developing large software systems presents a significant intellectual challenge. Producing software with an acceptably high reliability within a predictable schedule and budget has historically been difficult; the academic and professional discipline of software engineering concentrates specifically on this challenge.

Components Main articles: Central processing unit and Microprocessor A general purpose computer has four main components: the arithmetic logic unit (ALU), the control unit, the memory, and the input and output devices (collectively termed I/O). These parts are interconnected by busses, often made of groups of wires.

Inside each of these parts are thousands to trillions of small electrical circuits which can be turned off or on by means of an electronic switch. Each circuit represents a bit (binary digit) of information so that when the circuit is on it represents a "1", and when off it represents a "0" (in positive logic representation). The circuits are arranged in logic gates so that one or more of the circuits may control the state of one or more of the other circuits.

The control unit, ALU, registers, and basic I/O (and often other hardware closely linked with these) are collectively known as a central processing unit (CPU). Early CPUs were composed of many separate components but since the mid-1970s CPUs have typically been constructed on a single integrated circuit called a microprocessor.

Control unit Main articles: CPU design and Control unit Diagram showing how a particular MIPS architecture instruction would be decoded by the control system. The control unit (often called a control system or central controller) manages the computer's various components; it reads and interprets (decodes) the program instructions, transforming them into a series of control signals which activate other parts of the computer.[38] Control systems in advanced computers may change the order of some instructions so as to improve performance.

A key component common to all CPUs is the program counter, a special memory cell (a register) that keeps track of which location in memory the next instruction is to be read from.[39]

The control system's function is as follows—note that this is a simplified description, and some of these steps may be performed concurrently or in a different order depending on the type of CPU:

Read the code for the next instruction from the cell indicated by the program counter.
Decode the numerical code for the instruction into a set of commands or signals for each of the other systems.
Increment the program counter so it points to the next instruction.
Read whatever data the instruction requires from cells in memory (or perhaps from an input device). The location of this required data is typically stored within the instruction code.
Provide the necessary data to an ALU or register.
If the instruction requires an ALU or specialized hardware to complete, instruct the hardware to perform the requested operation.
Write the result from the ALU back to a memory location or to a register or perhaps an output device.
Jump back to step (1).

Since the program counter is (conceptually) just another set of memory cells, it can be changed by calculations done in the ALU. Adding 100 to the program counter would cause the next instruction to be read from a place 100 locations further down the program. Instructions that modify the program counter are often known as "jumps" and allow for loops (instructions that are repeated by the computer) and often conditional instruction execution (both examples of control flow).

The sequence of operations that the control unit goes through to process an instruction is in itself like a short computer program, and indeed, in some more complex CPU designs, there is another yet smaller computer called a microsequencer, which runs a microcode program that causes all of these events to happen.

Arithmetic logic unit (ALU) Main article: Arithmetic logic unit The ALU is capable of performing two classes of operations: arithmetic and logic.[40]

The set of arithmetic operations that a particular ALU supports may be limited to addition and subtraction, or might include multiplication, division, trigonometry functions such as sine, cosine, etc., and square roots. Some can only operate on whole numbers (integers) whilst others use floating point to represent real numbers, albeit with limited precision. However, any computer that is capable of performing just the simplest operations can be programmed to break down the more complex operations into simple steps that it can perform. Therefore, any computer can be programmed to perform any arithmetic operation—although it will take more time to do so if its ALU does not directly support the operation. An ALU may also compare numbers and return boolean truth values (true or false) depending on whether one is equal to, greater than or less than the other ("is 64 greater than 65?").

Logic operations involve Boolean logic: AND, OR, XOR and NOT. These can be useful for creating complicated conditional statements and processing boolean logic.

Superscalar computers may contain multiple ALUs, allowing them to process several instructions simultaneously.[41] Graphics processors and computers with SIMD and MIMD features often contain ALUs that can perform arithmetic on vectors and matrices.

Memory Main article: Computer data storage Magnetic core memory was the computer memory of choice throughout the 1960s, until it was replaced by semiconductor memory. A computer's memory can be viewed as a list of cells into which numbers can be placed or read. Each cell has a numbered "address" and can store a single number. The computer can be instructed to "put the number 123 into the cell numbered 1357" or to "add the number that is in cell 1357 to the number that is in cell 2468 and put the answer into cell 1595". The information stored in memory may represent practically anything. Letters, numbers, even computer instructions can be placed into memory with equal ease. Since the CPU does not differentiate between different types of information, it is the software's responsibility to give significance to what the memory sees as nothing but a series of numbers.

In almost all modern computers, each memory cell is set up to store binary numbers in groups of eight bits (called a byte). Each byte is able to represent 256 different numbers (2^8 = 256); either from 0 to 255 or −128 to +127. To store larger numbers, several consecutive bytes may be used (typically, two, four or eight). When negative numbers are required, they are usually stored in two's complement notation. Other arrangements are possible, but are usually not seen outside of specialized applications or historical contexts. A computer can store any kind of information in memory if it can be represented numerically. Modern computers have billions or even trillions of bytes of memory.

The CPU contains a special set of memory cells called registers that can be read and written to much more rapidly than the main memory area. There are typically between two and one hundred registers depending on the type of CPU. Registers are used for the most frequently needed data items to avoid having to access main memory every time data is needed. As data is constantly being worked on, reducing the need to access main memory (which is often slow compared to the ALU and control units) greatly increases the computer's speed.

Computer main memory comes in two principal varieties: random-access memory or RAM and read-only memory or ROM. RAM can be read and written to anytime the CPU commands it, but ROM is pre-loaded with data and software that never changes, therefore the CPU can only read from it. ROM is typically used to store the computer's initial start-up instructions. In general, the contents of RAM are erased when the power to the computer is turned off, but ROM retains its data indefinitely. In a PC, the ROM contains a specialized program called the BIOS that orchestrates loading the computer's operating system from the hard disk drive into RAM whenever the computer is turned on or reset. In embedded computers, which frequently do not have disk drives, all of the required software may be stored in ROM. Software stored in ROM is often called firmware, because it is notionally more like hardware than software. Flash memory blurs the distinction between ROM and RAM, as it retains its data when turned off but is also rewritable. It is typically much slower than conventional ROM and RAM however, so its use is restricted to applications where high speed is unnecessary.[42]

In more sophisticated computers there may be one or more RAM cache memories, which are slower than registers but faster than main memory. Generally computers with this sort of cache are designed to move frequently needed data into the cache automatically, often without the need for any intervention on the programmer's part.

Input/output (I/O) Main article: Input/output Hard disk drives are common storage devices used with computers. I/O is the means by which a computer exchanges information with the outside world.[43] Devices that provide input or output to the computer are called peripherals.[44] On a typical personal computer, peripherals include input devices like the keyboard and mouse, and output devices such as the display and printer. Hard disk drives, floppy disk drives and optical disc drives serve as both input and output devices. Computer networking is another form of I/O.

I/O devices are often complex computers in their own right, with their own CPU and memory. A graphics processing unit might contain fifty or more tiny computers that perform the calculations necessary to display 3D graphics[citation needed]. Modern desktop computers contain many smaller computers that assist the main CPU in performing I/O.

Multitasking Main article: Computer multitasking While a computer may be viewed as running one gigantic program stored in its main memory, in some systems it is necessary to give the appearance of running several programs simultaneously. This is achieved by multitasking i.e. having the computer switch rapidly between running each program in turn.[45]

One means by which this is done is with a special signal called an interrupt, which can periodically cause the computer to stop executing instructions where it was and do something else instead. By remembering where it was executing prior to the interrupt, the computer can return to that task later. If several programs are running "at the same time", then the interrupt generator might be causing several hundred interrupts per second, causing a program switch each time. Since modern computers typically execute instructions several orders of magnitude faster than human perception, it may appear that many programs are running at the same time even though only one is ever executing in any given instant. This method of multitasking is sometimes termed "time-sharing" since each program is allocated a "slice" of time in turn.[46]

Before the era of cheap computers, the principal use for multitasking was to allow many people to share the same computer.

Seemingly, multitasking would cause a computer that is switching between several programs to run more slowly, in direct proportion to the number of programs it is running, but most programs spend much of their time waiting for slow input/output devices to complete their tasks. If a program is waiting for the user to click on the mouse or press a key on the keyboard, then it will not take a "time slice" until the event it is waiting for has occurred. This frees up time for other programs to execute so that many programs may be run simultaneously without unacceptable speed loss.

Multiprocessing Main article: Multiprocessing Cray designed many supercomputers that used multiprocessing heavily. Some computers are designed to distribute their work across several CPUs in a multiprocessing configuration, a technique once employed only in large and powerful machines such as supercomputers, mainframe computers and servers. Multiprocessor and multi-core (multiple CPUs on a single integrated circuit) personal and laptop computers are now widely available, and are being increasingly used in lower-end markets as a result.

Supercomputers in particular often have highly unique architectures that differ significantly from the basic stored-program architecture and from general purpose computers.[47] They often feature thousands of CPUs, customized high-speed interconnects, and specialized computing hardware. Such designs tend to be useful only for specialized tasks due to the large scale of program organization required to successfully utilize most of the available resources at once. Supercomputers usually see usage in large-scale simulation, graphics rendering, and cryptography applications, as well as with other so-called "embarrassingly parallel" tasks.

Networking and the Internet Main articles: Computer networking and Internet Visualization of a portion of the routes on the Internet. Computers have been used to coordinate information between multiple locations since the 1950s. The U.S. military's SAGE system was the first large-scale example of such a system, which led to a number of special-purpose commercial systems such as Sabre.[48]

In the 1970s, computer engineers at research institutions throughout the United States began to link their computers together using telecommunications technology. The effort was funded by ARPA (now DARPA), and the computer network that resulted was called the ARPANET.[49] The technologies that made the Arpanet possible spread and evolved.

In time, the network spread beyond academic and military institutions and became known as the Internet. The emergence of networking involved a redefinition of the nature and boundaries of the computer. Computer operating systems and applications were modified to include the ability to define and access the resources of other computers on the network, such as peripheral devices, stored information, and the like, as extensions of the resources of an individual computer. Initially these facilities were available primarily to people working in high-tech environments, but in the 1990s the spread of applications like e-mail and the World Wide Web, combined with the development of cheap, fast networking technologies like Ethernet and ADSL saw computer networking become almost ubiquitous. In fact, the number of computers that are networked is growing phenomenally. A very large proportion of personal computers regularly connect to the Internet to communicate and receive information. "Wireless" networking, often utilizing mobile phone networks, has meant networking is becoming increasingly ubiquitous even in mobile computing environments.

Computer architecture paradigms There are many types of computer architectures:

Quantum computer vs Chemical computer
Scalar processor vs Vector processor
Non-Uniform Memory Access (NUMA) computers
Register machine vs Stack machine
Harvard architecture vs von Neumann architecture
Cellular architecture

The quantum computer architecture holds the most promise to revolutionize computing.[50]

Logic gates are a common abstraction which can apply to most of the above digital or analog paradigms.

The ability to store and execute lists of instructions called programs makes computers extremely versatile, distinguishing them from calculators. The Church–Turing thesis is a mathematical statement of this versatility: any computer with a minimum capability (being Turing-complete) is, in principle, capable of performing the same tasks that any other computer can perform. Therefore any type of computer (netbook, supercomputer, cellular automaton, etc.) is able to perform the same computational tasks, given enough time and storage capacity.

Misconceptions A computer does not need to be electronic, nor even have a processor, nor RAM, nor even a hard disk. While popular usage of the word "computer" is synonymous with a personal computer, the definition of a computer is literally "A device that computes, especially a programmable [usually] electronic machine that performs high-speed mathematical or logical operations or that assembles, stores, correlates, or otherwise processes information."[51] Any device which processes information qualifies as a computer, especially if the processing is purposeful.

Required technology Main article: Unconventional computing Historically, computers evolved from mechanical computers and eventually from vacuum tubes to transistors. However, conceptually computational systems as flexible as a personal computer can be built out of almost anything. For example, a computer can be made out of billiard balls (billiard ball computer); an oft-quoted example. More realistically, modern computers are made out of transistors made of photolithographed semiconductors.

There is active research to make computers out of many promising new types of technology, such as optical computers, DNA computers, neural computers, and quantum computers. Most computers are universal, and are able to calculate any computable function, and are limited only by their memory capacity and operating speed. However different designs of computers can give very different performance for particular problems; for example quantum computers can potentially break some modern encryption algorithms (by quantum factoring) very quickly.

Further topics

Glossary of computers

Artificial intelligence A computer will solve problems in exactly the way it is programmed to, without regard to efficiency, alternative solutions, possible shortcuts, or possible errors in the code. Computer programs that learn and adapt are part of the emerging field of artificial intelligence and machine learning.

Hardware Main articles: Computer hardware and Personal computer hardware The term hardware covers all of those parts of a computer that are tangible objects. Circuits, displays, power supplies, cables, keyboards, printers and mice are all hardware.

History of computing hardware Main article: History of computing hardware First Generation (Mechanical/Electromechanical) Calculators Antikythera mechanism, Difference engine, Norden bombsight Programmable Devices Jacquard loom, Analytical engine, Harvard Mark I, Z3 Second Generation (Vacuum Tubes) Calculators Atanasoff–Berry Computer, IBM 604, UNIVAC 60, UNIVAC 120 Programmable Devices Colossus, ENIAC, Manchester Small-Scale Experimental Machine, EDSAC, Manchester Mark 1, Ferranti Pegasus, Ferranti Mercury, CSIRAC, EDVAC, UNIVAC I, IBM 701, IBM 702, IBM 650, Z22 Third Generation (Discrete transistors and SSI, MSI, LSI Integrated circuits) Mainframes IBM 7090, IBM 7080, IBM System/360, BUNCH Minicomputer PDP-8, PDP-11, IBM System/32, IBM System/36 Fourth Generation (VLSI integrated circuits) Minicomputer VAX, IBM System i 4-bit microcomputer Intel 4004, Intel 4040 8-bit microcomputer Intel 8008, Intel 8080, Motorola 6800, Motorola 6809, MOS Technology 6502, Zilog Z80 16-bit microcomputer Intel 8088, Zilog Z8000, WDC 65816/65802 32-bit microcomputer Intel 80386, Pentium, Motorola 68000, ARM architecture 64-bit microcomputer[52] Alpha, MIPS, PA-RISC, PowerPC, SPARC, x86-64 Embedded computer Intel 8048, Intel 8051 Personal computer Desktop computer, Home computer, Laptop computer, Personal digital assistant (PDA), Portable computer, Tablet PC, Wearable computer Theoretical/experimental Quantum computer, Chemical computer, DNA computing, Optical computer, Spintronics based computer Other hardware topics Peripheral device (input/output) Input Mouse, keyboard, joystick, image scanner, webcam, graphics tablet, microphone Output Monitor, printer, loudspeaker Both Floppy disk drive, hard disk drive, optical disc drive, teleprinter Computer busses Short range RS-232, SCSI, PCI, USB Long range (computer networking) Ethernet, ATM, FDDI Software Main article: Computer software Software refers to parts of the computer which do not have a material form, such as programs, data, protocols, etc. When software is stored in hardware that cannot easily be modified (such as BIOS ROM in an IBM PC compatible), it is sometimes called "firmware" to indicate that it falls into an uncertain area somewhere between hardware and software.

Operating system Unix and BSD UNIX System V, IBM AIX, HP-UX, Solaris (SunOS), IRIX, List of BSD operating systems GNU/Linux List of Linux distributions, Comparison of Linux distributions Microsoft Windows Windows 95, Windows 98, Windows NT, Windows 2000, Windows Me, Windows XP, Windows Vista, Windows 7 DOS 86-DOS (QDOS), PC-DOS, MS-DOS, DR-DOS, FreeDOS Mac OS Mac OS classic, Mac OS X Embedded and real-time List of embedded operating systems Experimental Amoeba, Oberon/Bluebottle, Plan 9 from Bell Labs Library Multimedia DirectX, OpenGL, OpenAL Programming library C standard library, Standard Template Library Data Protocol TCP/IP, Kermit, FTP, HTTP, SMTP File format HTML, XML, JPEG, MPEG, PNG User interface Graphical user interface (WIMP) Microsoft Windows, GNOME, KDE, QNX Photon, CDE, GEM, Aqua Text-based user interface Command-line interface, Text user interface Application Office suite Word processing, Desktop publishing, Presentation program, Database management system, Scheduling & Time management, Spreadsheet, Accounting software Internet Access Browser, E-mail client, Web server, Mail transfer agent, Instant messaging Design and manufacturing Computer-aided design, Computer-aided manufacturing, Plant management, Robotic manufacturing, Supply chain management Graphics Raster graphics editor, Vector graphics editor, 3D modeler, Animation editor, 3D computer graphics, Video editing, Image processing Audio Digital audio editor, Audio playback, Mixing, Audio synthesis, Computer music Software engineering Compiler, Assembler, Interpreter, Debugger, Text editor, Integrated development environment, Software performance analysis, Revision control, Software configuration management Educational Edutainment, Educational game, Serious game, Flight simulator Games Strategy, Arcade, Puzzle, Simulation, First-person shooter, Platform, Massively multiplayer, Interactive fiction Misc Artificial intelligence, Antivirus software, Malware scanner, Installer/Package management systems, File manager Languages There are thousands of different programming languages—some intended to be general purpose, others useful only for highly specialized applications.

Programming languages Lists of programming languages Timeline of programming languages, List of programming languages by category, Generational list of programming languages, List of programming languages, Non-English-based programming languages Commonly used Assembly languages ARM, MIPS, x86 Commonly used high-level programming languages Ada, BASIC, C, C++, C#, COBOL, Fortran, Java, Lisp, Pascal, Object Pascal Commonly used Scripting languages Bourne script, JavaScript, Python, Ruby, PHP, Perl Professions and organizations As the use of computers has spread throughout society, there are an increasing number of careers involving computers.

Computer-related professions Hardware-related Electrical engineering, Electronic engineering, Computer engineering, Telecommunications engineering, Optical engineering, Nanoengineering Software-related Computer science, Desktop publishing, Human–computer interaction, Information technology, Information systems, Computational science, Software engineering, Video game industry, Web design The need for computers to work well together and to be able to exchange information has spawned the need for many standards organizations, clubs and societies of both a formal and informal nature.

Organizations Standards groups ANSI, IEC, IEEE, IETF, ISO, W3C Professional Societies ACM, AIS, IET, IFIP, BCS Free/Open source software groups Free Software Foundation, Mozilla Foundation, Apache Software Foundation See also Information technology portal

Notes

^ In 1946, ENIAC required an estimated 174 kW. By comparison, a modern laptop computer may use around 30 W; nearly six thousand times less. "Approximate Desktop & Notebook Power Usage". University of Pennsylvania. http://www.upenn.edu/computing/provider/docs/hardware/powerusage.html. Retrieved 20 June 2009.
^ Early computers such as Colossus and ENIAC were able to process between 5 and 100 operations per second. A modern "commodity" microprocessor (as of 2007) can process billions of operations per second, and many of these operations are more complicated and useful than early computer operations. "Intel Core2 Duo Mobile Processor: Features". Intel Corporation. http://www.intel.com/cd/channel/reseller/asmo-na/eng/products/mobile/processors/core2duo_m/feature/index.htm. Retrieved 20 June 2009.
^ computer, n.. Oxford English Dictionary (2 ed.). Oxford University Press. 1989. http://dictionary.oed.com/. Retrieved 10 April 2009.
^ * Ifrah, Georges (2001). The Universal History of Computing: From the Abacus to the Quantum Computer. New York: John Wiley & Sons. ISBN 0-471-39671-0. From 2700 to 2300 BC, Georges Ifrah, pp.11
^ Berkeley, Edmund (1949). Giant Brains, or Machines That Think. John Wiley & Sons. p. 19. Edmund Berkeley
^ According to advertising on Pickett's N600 slide rule boxes."Pickett Apollo Box Scans". Copland.udel.edu. http://copland.udel.edu/~mm/sliderule/lem/. Retrieved 20 February 2010.
^ "Discovering How Greeks Computed in 100 B.C.". The New York Times. 31 July 2008. http://www.nytimes.com/2008/07/31/science/31computer.html?hp. Retrieved 27 March 2010.
^ "Heron of Alexandria". http://www.mlahanas.de/Greeks/HeronAlexandria2.htm. Retrieved 15 January 2008.
^ Felt, Dorr E. (1916). Mechanical arithmetic, or The history of the counting machine. Chicago: Washington Institute. p. 8. http://www.archive.org/details/mechanicalarithm00feltrich. Dorr E. Felt
^ "Speaking machines". The parlour review, Philadelphia 1 (3). 20 January 1838. http://books.google.co.uk/books?id=Xt4PAAAAYAAJ&pg=PT38&dq=the+parlour+review+january+1838&hl=en&ei=0yqzTN3kLMTHswa2wMjSDQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCsQ6AEwAA#v=onepage&q&f=false. Retrieved 11 October 2010.
^ Felt, Dorr E. (1916). Mechanical arithmetic, or The history of the counting machine. Chicago: Washington Institute. p. 10. http://www.archive.org/details/mechanicalarithm00feltrich. Dorr E. Felt
^ "Pascal and Leibnitz, in the seventeenth century, and Diderot at a later period, endeavored to construct a machine which might serve as a substitute for human intelligence in the combination of figures" The Gentleman's magazine, Volume 202, p.100
^ Babbage's Difference engine in 1823 and his Analytical engine in the mid-1830s
^ "It is reasonable to inquire, therefore, whether it is possible to devise a machine which will do for mathematical computation what the automatic lathe has done for engineering. The first suggestion that such a machine could be made came more than a hundred years ago from the mathematician Charles Babbage. Babbage's ideas have only been properly appreciated in the last ten years, but we now realize that he understood clearly all the fundamental principles which are embodied in modern digital computers" Faster than thought, edited by B. V. Bowden, 1953, Pitman publishing corporation
^ "...Among this extraordinary galaxy of talent Charles Babbage appears to be one of the most remarkable of all. Most of his life he spent in an entirely unsuccessful attempt to make a machine which was regarded by his contemporaries as utterly preposterous, and his efforts were regarded as futile, time-consuming and absurd. In the last decade or so we have learnt how his ideas can be embodied in a modern digital computer. He understood more about the logic of these machines than anyone else in the world had learned until after the end of the last war" Foreword, Irascible Genius, Charles Babbage, inventor by Maboth Moseley, 1964, London, Hutchinson
^ In the proposal that Aiken gave IBM in 1937 while requesting funding for the Harvard Mark I we can read: "Few calculating machines have been designed strictly for application to scientific investigations, the notable exceptions being those of Charles Babbage and others who followed him ... After abandoning the difference engine, Babbage devoted his energy to the design and construction of an analytical engine of far higher powers than the difference engine ... Since the time of Babbage, the development of calculating machinery has continued at an increasing rate." Howard Aiken, Proposed automatic calculating machine, reprinted in: The origins of Digital computers, Selected Papers, Edited by Brian Randell, 1973, ISBN 3-540-06169-X
^ "Intel Museum – The 4004, Big deal then, Big deal now". Intel.com. http://www.intel.com/about/companyinfo/museum/exhibits/4004/index.htm. Retrieved 29 January 2012.
^ From cave paintings to the internet HistoryofScience.com
^ See: Anthony Hyman, ed., Science and Reform: Selected Works of Charles Babbage (Cambridge, England: Cambridge University Press, 1989), page 298. It is in the collection of the Science Museum in London, England. (Delve (2007), page 99.)
^ The analytical engine should not be confused with Babbage's difference engine which was a non-programmable mechanical calculator.
^ "Columbia University Computing History: Herman Hollerith". Columbia.edu. http://www.columbia.edu/acis/history/hollerith.html. Retrieved 11 December 2010.
^ a b "Alan Turing – Time 100 People of the Century". Time Magazine. http://205.188.238.181/time/time100/scientist/profile/turing.html. Retrieved 13 June 2009. "The fact remains that everyone who taps at a keyboard, opening a spreadsheet or a word-processing program, is working on an incarnation of a Turing machine"
^ "John Vincent Atanasoff and the Birth of Electronic Digital Computing". Cs.iastate.edu. http://www.cs.iastate.edu/jva/jva-archive.shtml. Retrieved 29 January 2012.
^ "John Vincent Atanasoff – the father of the computer". Columbia.edu. http://www.columbia.edu/~td2177/JVAtanasoff/JVAtanasoff.html. Retrieved 29 January 2012.
^ "Atanasoff-Berry Computer". http://energysciencenews.com/phpBB3/viewtopic.php?f=1&t=98&p=264#p264. Retrieved 20 November 2010.
^ "Spiegel: The inventor of the computer's biography was published". Der Spiegel. 28 September 2009. http://www.spiegel.de/netzwelt/gadgets/0,1518,651776,00.html. Retrieved 11 December 2010.
^ "Inventor Profile: George R. Stibitz". National Inventors Hall of Fame Foundation, Inc.. http://www.invent.org/hall_of_fame/140.html.
^ Rojas, R. (1998). "How to make Zuse's Z3 a universal computer". IEEE Annals of the History of Computing 20 (3): 51–54. doi:10.1109/85.707574.
^ B. Jack Copeland, ed., Colossus: The Secrets of Bletchley Park's Codebreaking Computers, Oxford University Press, 2006
^ "Robot Mathematician Knows All The Answers", October 1944, Popular Science. Google Books. http://books.google.com/books?id=PyEDAAAAMBAJ&pg=PA86&dq=motor+gun+boat&hl=en&ei=LxTqTMfGI4-bnwfEyNiWDQ&sa=X&oi=book_result&ct=result&resnum=6&ved=0CEIQ6AEwBQ#v=onepage&q=motor%20gun%20boat&f=true. Retrieved 11 December 2010.
^ Lavington 1998, p. 37
^ This program was written similarly to those for the PDP-11 minicomputer and shows some typical things a computer can do. All the text after the semicolons are comments for the benefit of human readers. These have no significance to the computer and are ignored. (Digital Equipment Corporation 1972)
^ It is not universally true that bugs are solely due to programmer oversight. Computer hardware may fail or may itself have a fundamental problem that produces unexpected results in certain situations. For instance, the Pentium FDIV bug caused some Intel microprocessors in the early 1990s to produce inaccurate results for certain floating point division operations. This was caused by a flaw in the microprocessor design and resulted in a partial recall of the affected devices.
^ Taylor, Alexander L., III (16 April 1984). "The Wizard Inside the Machine". TIME. http://www.time.com/time/printout/0,8816,954266,00.html. Retrieved 17 February 2007.
^ Even some later computers were commonly programmed directly in machine code. Some minicomputers like the DEC PDP-8 could be programmed directly from a panel of switches. However, this method was usually used only as part of the booting process. Most modern computers boot entirely automatically by reading a boot program from some non-volatile memory.
^ However, there is sometimes some form of machine language compatibility between different computers. An x86-64 compatible microprocessor like the AMD Athlon 64 is able to run most of the same programs that an Intel Core 2 microprocessor can, as well as programs designed for earlier microprocessors like the Intel Pentiums and Intel 80486. This contrasts with very early commercial computers, which were often one-of-a-kind and totally incompatible with other computers.
^ High level languages are also often interpreted rather than compiled. Interpreted languages are translated into machine code on the fly, while running, by another program called an interpreter.
^ The control unit's role in interpreting instructions has varied somewhat in the past. Although the control unit is solely responsible for instruction interpretation in most modern computers, this is not always the case. Many computers include some instructions that may only be partially interpreted by the control system and partially interpreted by another device. This is especially the case with specialized computing hardware that may be partially self-contained. For example, EDVAC, one of the earliest stored-program computers, used a central control unit that only interpreted four instructions. All of the arithmetic-related instructions were passed on to its arithmetic unit and further decoded there.
^ Instructions often occupy more than one memory address, therefore the program counter usually increases by the number of memory locations required to store one instruction.
^ David J. Eck (2000). The Most Complex Machine: A Survey of Computers and Computing. A K Peters, Ltd.. p. 54. ISBN 978-1-56881-128-4.
^ Erricos John Kontoghiorghes (2006). Handbook of Parallel Computing and Statistics. CRC Press. p. 45. ISBN 978-0-8247-4067-2.
^ Flash memory also may only be rewritten a limited number of times before wearing out, making it less useful for heavy random access usage. (Verma & Mielke 1988)
^ Donald Eadie (1968). Introduction to the Basic Computer. Prentice-Hall. p. 12.
^ Arpad Barna; Dan I. Porat (1976). Introduction to Microcomputers and the Microprocessors. Wiley. p. 85. ISBN 978-0-471-05051-3.
^ Jerry Peek; Grace Todino, John Strang (2002). Learning the UNIX Operating System: A Concise Guide for the New User. O'Reilly. p. 130. ISBN 978-0-596-00261-9.
^ Gillian M. Davis (2002). Noise Reduction in Speech Applications. CRC Press. p. 111. ISBN 978-0-8493-0949-6.
^ However, it is also very common to construct supercomputers out of many pieces of cheap commodity hardware; usually individual computers connected by networks. These so-called computer clusters]] can often provide supercomputer performance at a much lower cost than customized designs. While custom architectures are still used for most of the most powerful supercomputers, there has been a proliferation of cluster computers in recent years. (TOP500 2006)
^ Agatha C. Hughes (2000). Systems, Experts, and Computers. MIT Press. p. 161. ISBN 978-0-262-08285-3. "The experience of SAGE helped make possible the first truly large-scale commercial real-time network: the SABRE computerized airline reservations system..."
^ "A Brief History of the Internet". Internet Society. http://www.isoc.org/internet/history/brief.shtml. Retrieved 20 September 2008.
^ "Computer architecture: fundamentals and principles of computer design" by Joseph D. Dumas 2006. page 340.
^ "Definition of computer". Thefreedictionary.com. http://thefreedictionary.com/computer. Retrieved 29 January 2012.
^ Most major 64-bit instruction set architectures are extensions of earlier designs. All of the architectures listed in this table, except for Alpha, existed in 32-bit forms before their 64-bit incarnations were introduced.

References

a Kempf, Karl (1961). Historical Monograph: Electronic Computers Within the Ordnance Corps. Aberdeen Proving Ground (United States Army). http://ed-thelen.org/comp-hist/U-S-Ord-61.html.
a Phillips, Tony (2000). "The Antikythera Mechanism I". American Mathematical Society. http://www.math.sunysb.edu/~tony/whatsnew/column/antikytheraI-0400/kyth1.html. Retrieved 5 April 2006.
a Shannon, Claude Elwood (1940). A symbolic analysis of relay and switching circuits. Massachusetts Institute of Technology. http://hdl.handle.net/1721.1/11173.
Digital Equipment Corporation (1972) (PDF). PDP-11/40 Processor Handbook. Maynard, MA: Digital Equipment Corporation. http://bitsavers.vt100.net/dec/www.computer.museum.uq.edu.au_mirror/D-09-30_PDP11-40_Processor_Handbook.pdf.
Verma, G.; Mielke, N. (1988). Reliability performance of ETOX based flash memories. IEEE International Reliability Physics Symposium.
Meuer, Hans; Strohmaier, Erich; Simon, Horst; Dongarra, Jack (13 November 2006). "Architectures Share Over Time". TOP500. http://www.top500.org/lists/2006/11/overtime/Architectures. Retrieved 27 November 2006.
Lavington, Simon (1998). A History of Manchester Computers (2 ed.). Swindon: The British Computer Society. ISBN 978-0-902505-01-8.
Stokes, Jon (2007). Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture. San Francisco: No Starch Press. ISBN 978-1-59327-104-6.
Felt, Dorr E. (1916). Mechanical arithmetic, or The history of the counting machine. Chicago: Washington Institute. http://www.archive.org/details/mechanicalarithm00feltrich.
Ifrah, Georges (2001). The Universal History of Computing: From the Abacus to the Quantum Computer. New York: John Wiley & Sons. ISBN 0-471-39671-0.
Berkeley, Edmund (1949). Giant Brains, or Machines That Think. John Wiley & Sons.

External links

basic database management

Q. 1. What do you mean by database?

Ans. A database is a collection of occurrence of multiple record types containing the relationship between records, data aggregate and data items. A database may be defined as

A database is a collection of interrelated data store together without harmful and unnecessary redundancy (duplicate data) to serve multiple applications

The data is stored so that they are independent of programs, which use the data. A common and control approach is used in adding the new data, modifying and retrieving existing data or deletion of data within the database A running database has function in a corporation, factory, government department and other organization. Database is used for searching the data to answer some queries. A database may be design for batch processing, real time processing or on line processing.

DATABASE SYSTEM

Database System is an integrated collection of related files along with the detail about their definition, interpretation, manipulation and maintenance. It is a system, which satisfied the data need for various applications in an organization without unnecessary redundancy. A database system is based on the data. Also a database system can be run or executed by using software called DBMS (Database Management System). A database system controls the data from unauthorized access.

Foundation Data Concept

A hierarchy of several levels of data has been devised that differentiates between different groupings, or elements, of data. Data are logically organized into:

Character

It is the most basic logical data element. It consists of a single alphabetic, numeric, or other symbol.

Field

It consists of a grouping of characters. A data field represents an attribute (a characteristic or quality) of some entity (object, person, place, or event).

Record

The related fields of data are grouped to form a record. Thus, a record represents a collection of attributes that describe an entity. Fixed-length records contain, a fixed number of fixed-length data fields. Variable-length records contain a variable number of fields and field lengths.

File

A group of related records is known as a data file, or table. Files are frequently classified by the application for which they ar primarily used, such as a payroll file or an inventory file, or the type of data they contain, such as a document file or a graphical image file. Files are also classified by their permanence, for example, a master file versus a transaction file. A transaction file would contain records of

all transactions occurring during a period, whereas a master file contains all the permanent records. A history file is an obsolete transaction or master file retained for backup purposes or for long-term historical storage called archival storage.

Database

It is an integrated collection of logically related records or objects. A database consolidates records previously stored in separate files into a common pool of data records that provides data for many applications. The data stored in a database is independent of the application programs using it and o the ‘type of secondary storage devices on which it is stored.

Q. 2. What are the various characteristics of DBMS?

Ans. The major characteristics of database approach are:

• Self-describing Nature of a Database System

• Insulation between Programs and Data, and Data Abstraction

• Support of Multiple Views of the Data

• Sharing of Data and Multi user Transaction Processing

Q. 3. What are the various characteristics of DBMS approach?

Ans.

1. Self-contained nature

DBMS system contains data plus a full description of the data (called “metadata”) “metadata” is data about data - data formats, record structures, locations, how to access, indexes metadata is stored in a catalog and is used by DBMS software to know how to access the data. Contrast this with the file processing approach where application programs need to know the structure and format of records and data.

2. Program-data independence

Data independence is immunity of application programs to changes in storage structures and access techniques. E.g. adding a new field, changing index structure, changing data format, In a DBMS environment these changes are reflected in the catalog. Applications aren’t affected. Traditional file processing programs would all have to

change, possibly substantially.

3. Data abstraction

A DBMS provides users with a conceptual representation of data (for example, as objects with properties and inter-relationships). Storage details are hidden. Conceptual representation is provided in terms of a data model.

4. Support for multiple views

DBMS may allow different users to see different “views” of the DB, according to the perspective each one requires. E.g. a subset of the data - For example; the people using the payroll system need not/should not see data about students and class schedules. E.g. data presented in a different form from the way it is stored - For example someone interested in student transcripts might get a view which is formed by combining information from separate files or tables.
5. Centralized control of the data resource

The DBMS provides centralized control of data in an organization.

This brings a number of advantages:

(a) reduces redundancy

(b) avoids inconsistencies

(c) data can be shared

(d) standards can be enforced

(e) security restrictions can be applied

(f) integrity can be maintained

a, b. Redundancy and Inconsistencies

Redundancy is unnecessary duplication of data. For example if accounts department and registration department both keep student name, number and address.

Redundancy wastes space and duplicates effort in maintaining the data.

Redundancy also leads to inconsistency.

Inconsistent data is data which contradicts itself - e.g. two different addresses for a given student number. Inconsistency cannot occur if data is represented by a single entry (i.e. if there is no redundancy).

Controlled redundancy: Some redundancy may be desirable (for efficiency). A DBMS should be aware of it, and take care of propagating updates to all copies of a data item.

This is an objective, not yet currently supported.

c. Sharing

• Need concurrency control

• Multiple user views

d. Standards

E.g. data formats, record structures, naming, documentation

International, organizational, departmental ... standards

e. Security

- restricting unauthorized access

DBMS should perform security checks on all accesses.

f. Integrity

Maintaining validity of data;

e.g. employee numbers must be in some range

e.g. every course must have an instructor

e.g.. student number must be unique

e.g. hours worked cannot be more than 150

These things are expressed as constraints.

DBMS should perform integrity checks on all updates. Currently DBMSs provide limited integrity checks.

Q. 3. What are the various types of databases?

Ans. Types of Databases

Continuing developments in information technology and its business applications have resulted in the evolution of several major types of databases. Several major conceptual categories of databases that may be found in computer-using organizations include:

Operational Databases

The databases store detailed data needed to support the operations of the entire organization. They are also called subject area databases (SADB), transaction databases, and production databases: Examples are customer databases, personnel databases, inventory databases, and other databases containing data generated by business operations

Distributed Databases

Many organizations replicate and distribute copies or parts of databases to network

sewers at a variety of sites. These distributed databases can reside on network servers

on the World Wide Web, on corporate Intranets or extranets, or on other company networks. Distributed databases may be copies of operational or analytical. databases,

hypermedia or discussion databases, or any other type of database. Replication and distribution of databases is done to improve database performance and security.

External Databases

Access to external, privately owned online databases or data banks is available for a fee to end users and organizations from commercial online services, and with or without charge from many sources on the Internet, especially the Web.

Hypermedia Databases

It consists of hyperlinked pages of multimedia (text, graphics, and photographic images, video clips, audio segments, etc.). From a database management point of view, the set of interconnected multimedia pages at a website is a database of interrelated hypermedia page elements, rather than interrelated data records.

Q. 4. What do you mean by DBMS?

Ans. A DBMS is best described as a collection of programs that manage the database structure and that control shared access to the data in the database. Current DBMSes also store the relationships between the database components; they also take care of defining the required access paths to those components

A database management system (DBMS) is the combination of data, hardware, software and users to help an enterprise manage its operational data.

The main function of a DBMS is to provide efficient and reliable methods of data retrieval to many users. Efficient data retrieval is an essential function of database systems. DBMS must be able to deal with several users who try to simultaneously access several items and most frequently, the same data item A DBMS is a set of programs that is used to store and manipulation data that include the following:

• Adding new data, for example adding details of new student.

• Deleting unwanted data, for example deleting the details of students who have

completed course.

• Changing existing data, for example modifying the fee paid by the student.

A database is the information to be stored whereas the database management system is the system used to manage the database. . This structure may be regarded in terms of its hardware implementation, called the physical structure, or this structure may be regarded independently of its hardware implementation, called the logical structure. In either case, the data structure is regarded as static because a database cannot ‘process anything. The DBMS is regarded as dynamic because it is through the DBMS that all database processing takes place. How the DBMS presents data to the user is called the view structure.

There are two general modes for data use: queries and transactions. Both forms use the DBMS for processing. The query is processed for presentation in views and none of these processes are written to the database. The transactional is processed for updating values in the database variables. These updates are written to the database. A DBMS provides various functions like data security, data integrity, data sharing, data concurrence, data independence, data recovery etc. However, all database management systems that are now available in the market like Sybase, Oracle, and MS-Access do not provide the same set of functions, though all are meant for data management.

Q. 5. What are the various components of DBMS?

Ans. Basic Components: A database system has four components. These four

components are important for understanding and designing the database system. These

are:

1. Data

2. Hardware

3. Software

4. Users

1. Data

As we have discussed above, data is raw hand information collected by us. Data is made up of data item or data aggregate. A Data item is the smallest unit of named data: It may consist of bits or bytes. A Data item is often referred to as field or data element. A Data aggregate is the collection of data items within the record, which is given a name and referred as a whole. Data can be collected orally or written. A database can be integrated and shared. Data stored in a system is partition into one or two databases. So if by chance data lost or damaged at one place, then it can be accessed from the second place by using the sharing facility of data base system. So a shared data also cane be reused according to the user’s requirement. Also data must be in the integrated form. Integration means data should be in unique form i.e. data collected by using a well-defined manner with no redundancy, for example Roll number in a class is non-redundant form and so these have unique resistance, but names in class may be in the redundant form and can create lot of problems later on in using and accessing the data.

2. Hardware

Hardware is also a major and primary part of the database. Without hardware nothing can be done. The definition of Hardware is “which we can touch and see”, i.e. it has physical existences. All physical quantity or items are in this category. For example, all the hardware input/output and storage devices like keyboard, mouse, scanner, monitor, storage devices (hard disk, floppy disk, magnetic disk, and magnetic drum) etc. are commonly used with a computer system.

3. Software

Software is another major part of the database system. It is the other side of hardware. Hardware and software are two sides of a coin. They go side by side. Software is a system. Software are further subdivided into two categories, First type is system software (like all the operating systems, all the languages and system packages etc.) and second one is an application software (payroll, electricity billing, hospital management and hostel administration etc.). We can define software as which we cannot touch and see. Software only can execute. By using software, data can be manipulated, organized and stored. -

4. Users

Without user all of the above said components (data, hardware & software) are meaning less. User can collect the data, operate and handle the hardware. Also operator feeds the data and arranges the data in order by executing the software. Other components

  1. People - Database administrator; system developer; end user.

  2. CASE tools: Computer-aided Software Engineering (CASE) tools.

3. User interface - Microsoft Access; PowerBuilder.

4. Application Programs - PowerBuilder script language; Visual Basic; C++; COBOL.

  5. Repository - Store definitions of data called METADATA, screen and report formats, menu definitions, etc.

6. Database - Store actual occurrences data.

7. DBMS - Provide tools to manage all of this - create data, maintain data, control security access to data and to the repository, etc.

Q. 6.What are the various functions of DBMS?

Ans. These functions will include support for at least all of the following:

  • Data definition: The DBMS must be able to accept data definitions (external schemas, the conceptual schema, the internal schema, and all associated mappings) in source form and convert them to the appropriate object form.

• Data manipu1ation: The DBMS must be able to handle requests from the users to retrieve, update, or delete existing data the database, or to add new data to the database. In other words, the DBMS must include a data manipulation language (DML) processor component.

  • Data security and integrity: The DBMS must monitor user requests and reject

any attempt to violate the security and integrity rules defined by the DBA.

  • Data recovery and concurrency: The DBMS - or else some other related software component, usually called the transaction manager - must enforce certain recovery and concurrency controls.

• Data Dictionary: The DBMS must provide a data dictionary function. The data dictionary can be regarded as a database in its own right (but a system database, rather than a user database). The dictionary contains “data about the data” (sometimes called metadata) - that is, definitions of other objects in the system - rather than just”raw data.” In particular, all the various schemas and mapping (external, conceptual, etc.) will physically be stored, in both source and object form, in the dictionary. A comprehensive dictionary will also include cross- reference information, showing, for instance, which programs use which pieces of the database, which users require which reports, which terminals are connected to the system, and so on. The dictionary might even - in fact, probably should — be integrated into the database it defines, and thus include its own definition. It should certainly be possible to query the dictionary just like any other database, so that, for example, it is possible to tell which programs and or users are likely to be affected by some proposed change to the system.

  Performance: It goes without saying that the DBMS should perform all of the functions identified above as efficiently as possible.

Q7. What are the advantages and disadvantages of a database approach?

Ans. ADVANTAGES OF DBMS

One of the major advantages of using a database system is that the organization

can be handled easily and have centralized management and control over the data by

the DBA. Some more and main advantages of database management system are given

below:

The main advantages of DBMS are:

1. Controlling Redundancy

In a DBMS there is no redundancy (duplicate data). If any type of duplicate data arises, then DBA can control and arrange data in non-redundant way. It stores the data on the basis of a primary key, which is always unique key and have non-redundant information. For example, Roll no is the primary key to store the student data.

In traditional file processing, every user group maintains its own files. Each group independently keeps files on their db e.g., students. Therefore, much of the data is stored twice or more. Redundancy leads to several problems:

• Duplication of effort

• Storage space wasted when the same data is stored repeatedly

Files that represent the same data may become inconsistent (since the updates are applied independently by each users group).We can use controlled redundancy.

2. Restricting Unauthorized Access

A DBMS should provide a security and authorization subsystem.

• Some db users will not be authorized to access all information in the db (e.g., financial data).

• Some users are allowed only to retrieve data.

• Some users are allowed both to retrieve and to update database.

3. Providing Persistent Storage for Program Objects and Data Structures

Data structure provided by DBMS must be compatible with the programming language’s data structures. E.g., object oriented DBMS are compatible with programming languages such as C++, SMALL TALK, and the DBMS software automatically performs conversions between programming data structure and file formats.

4. Permitting Inferencing and Actions Using Deduction Rules

Deductive database systems provide capabilities for defining deduction rules for inferencing new information from the stored database facts.

5. Inconsistency can be reduced

In a database system to some extent data is stored in, inconsistent way. Inconsistency is another form of delicacy. Suppose that an em1oyee “Japneet” work in department “Computer” is represented by two distinct entries in a database. So way inconsistent data is stored and DBA can remove this inconsistent data by using DBMS.

6. Data can be shared

In a database system data can be easily shared by different users. For example, student data can be share by teacher department, administrative block, accounts branch arid laboratory etc.

7. Standard can be enforced or maintained

By using database system, standard can be maintained in an organization. DBA is overall controller of database system. Database is manually computed, but when DBA uses a DBMS and enter the data in computer, then standard can be enforced or maintained by using the computerized system.

8. Security can be maintained

Passwords can be applied in a database system or file can be secured by DBA. Also in a database system, there are different coding techniques to code the data i.e. safe the data from unauthorized access. Also it provides login facility to use for securing and saving the data either by accidental threat or by intentional threat. Same recovery procedure can be also maintained to access the data by using the DBMS facility.

9. Integrity can be maintained

In a database system, data can be written or stored in integrated way. Integration means unification and sequencing of data. In other words it can be defined as “the data contained in the data base is both accurate and consistent”. ‘Data can be accessed if it is

compiled in a unique form. We can take primary key ad some secondary key for integration of data. Centralized control can also ensure that adequate checks are

incorporated in the DBMS to provide data integrity.

10. Confliction can be removed

In a database system, data can be written or arranged in a well-defined manner by DBA. So there is no confliction between the databases. DBA select the best file structure and accessing strategy to get better performance for the representation and use of the

data.

11. Providing Multiple User Interfaces

For example query languages, programming languages interfaces, forms, menu- driven interfaces, etc.

12. Representing Complex Relationships Among Data

It is used to represent Complex Relationships Among Data

13. Providing Backup and Recovery

The DBMS also provides back up and recovery features.

DISADVANTAGES OF DBMS

Database management system has many advantages, but due to some major problem

arise in using the DBMS, it has some disadvantages. These are explained as:

1.Cost

A significant disadvantage of DBMS is cost. In addition to the cost of purchasing or developing the software, the organization *111 also purchase or upgrade the hardware

and so it becomes a costly system. Also additional cost occurs due to migration of data

from one environment of DBMS to another environment.

2. Problems associated with centralization

Centralization also means that data is accessible from a single source. As we know the centralized data can be accessed by each user, so there is no security of data from unauthorized access and data can be damaged or lost.

3. Complexity of backup and recovery

Backup and recovery are fairly complex in DBMS environment. As in a DBMS, if you take a backup of the data then it may affect the multi-user database system which is in operation. Damage database can be recovered from the backup floppy, but iterate duplicacy in loading to the concurrent multi-user database system.

4. Confidentiality, Privacy and Security

When information is centralized and is made available to users from remote locations, the possibilities of abuse are often more than in a conventional system. To reduce the chances of unauthorized users accessing sensitive information, it is necessary to take technical, administrative and, possibly, legal measures. Most, databases store valuable information that must be protected against deliberate trespass and destruction.

5. Data Quality

Since the database is accessible to users remotely, adequate controls are needed to control users updating data and to control data quality. With increased number of users accessing data directly, there are enormous opportunities for users to damage the data. Unless there are suitable controls, the data quality may be compromised.

6. Data Integrity

Since a large number of users could be using .a database concurrently, technical safeguards are necessary to ensure that the data remain correct during operation. The main threat to data integrity comes from several different users attempting to update the same data at the same time. The database therefore needs to be protected against inadvertent changes by the users.

7. Enterprise Vulnerability

Centralizing all data of an enterprise in one database may mean that the database becomes an indispensable resource. The survival of the enterprise may depend on reliable information being available from its database. The enterprise therefore becomes vulnerable to the destruction of the database or to unauthorized modification of the database.

8. The Cost of using a DBMS

Conventional data processing systems are typically designed to run a number of well-defined, preplanned processes. Such systems are often “tuned” to run efficiently for the processes that they were designed for. Although the conventional systems are usually fairly inflexible in that new applications may be difficult to implement and/or expensive to run, they are usually very efficient for the applications they are designed for.

The database approach on the other hand provides a flexible alternative where new applications can be developed relatively inexpensively. The flexible approach is not without its costs and one of these costs is the additional cost of running applications that the conventional system was designed for. Using standardized software is almost always less machine efficient than specialized software.

Q. 8. List five significant differences between a file-processing system and a DBMS.

Ans. Before differentiating between file and database systems, there be need to understand the DBMS and its component. Let us consider an organization have a huge amount (collection) of data on its different departments, its employees, its products, sale and purchase order etc. As we know such type of data is accessed simultaneously by different and several employees. Now some users apply number of queries and want answers quickly. If data is stored in the files, then it will create a problem of slow processing. As we try to deal with this type of data management problem by storing the data in a collection of operating system files. Such type of techniques creates number of problems or drawbacks, which are discussed as below:

1. As we have not 1000GB main memory (primary memory) to store the data, so we store the data in some permanent storage device (secondary memory) like magnetic disk or magnetic tape etc. So file-oriented system fails in primary memory cases and we apply data base management system to store the data files permanently.

2. Suppose if we have such a large amount of primary memory on a 16 bit or 32 bit computer system, then there be a problem occur in file based system to use the data by direct or random addressing. Also we cannot call more then 2GB or 4Gb of data direct to the primary memory at a time. So there be need a database program to identify the data.

3. Some programs are too lengthy and complex which cannot store large amount of data in the files related to the operating systems. But a database system made it simple and fast.

4. We cannot change and access file-oriented data simultaneously, so we have requirement a type of system which can be used to access the large amount of data concurrently.

5. Also we cannot recall or recover the file-oriented data, but centralized database management solve such type of problem.

6. File oriented operating system provide only a password mechanism for security, but this is not successful in case of number of users are accessing the same data by using the same login.

At end we can sat that a DBMS is a piece of software that is designed to make the processing faster and easier.

Q 9 Describe major advantages of a database system over file system Or Discuss the DBMS and File processing system Also give the limitations of file processing system

Ans. TRADITIONAL FILE PROCESSING

Data are organized, stored, and processed in independent files of data records. In the traditional file processing approach, each business application was designed to use one or more specialized data files containing only specific types of data records

  TRADITIONAL FILE SYSTEM OR FILE ORIENTED APPROACH

The business computers of 1980 were used in processing of business records and produce information using file oriented approach or file processing environment At that time that system was reliable and faster than the manual system of record keeping and processing In this system the data is organized in the form of different files. Since that system was the collection of files - so we can say it was a file-oriented system. Following terms was commonly used in this approach or the features of File oriented system.

1. Master file

The file that is created only once i.e. at the starting of computerization or a file which rarely changes. For example: In a bank master file the account no, name and balance are entered only once and less frequently changes.

2. File activity ratio

The number of records processed one run divided by total number of records. For example: if we changes 100 records from a bank file containing 200 records then file activity ratio is 100/200 0.5. It should be noted that this ratio of master file is less.

3. Transaction file

A file that is created repeatedly after regular interval of time. For example: the payroll file of employee is updated at the end of every month.

4. File volatility ratio

It is the number of records updated in a transaction file divided by total number of records. The file volatility ratio of transaction file is very high.

5. Work file

A temporary file that helps in sorting and merging of records from one file to other.

6. File organization

It means the arrangement of records in a particular order. There were three types of file organizations

Sequential
Direct
Indexed sequential

7. Data island

In this system each dept has its own files designed for local applications. Each department has its own data processing staff, set of policies, working rules and report formats. It means programs were depending on the file structure or format of file. If the structure of file changes, the program has also to be changed. These days the file oriented approach is still used but has following limitations:

LIMITATIONS OF FILE ORIENTED APPROACH

• Duplicate data

Since all the files are independent of each other. So some of the fields or files are stored more than once. Hence duplicacy is more in case of file approach but dbms has controlled duplicacy.

• Separated and isolated data

To make a decision, a user might need data from two separate files. First, analysts and programmers to determine the specific data required from each file and the relationships between the data evaluated the files. Then applications could be written in a third generation language to process and extract the needed data. Imagine the work involved if data from several files was needed!

• Inconsistency

In this system, data is not consistent. If a data item is changed the all the files containing that data item need to be changed and updated properly. If all the files are not updated properly there may be high risk of inconsistency. DBMS have data consistency.

• Poor data integrity

A collection of data has integrity. A file is said to be have data integrity - it means a item is not be stored in duplicate manner. It has been seen that file oriented system have poor data integrity control. Data integrity has been achieved in DBMS.

• Every operation is programmable

The processing tasks like searching, editing, deletion etc should have separate programs. It means there were no functions available for these operations. DBMS have ready-made commands for such operations.

• Data inflexibility

Program-data interdependency and data isolation limited the flexibility of file processing systems in providing users with ad hoc information requests. Because designing applications was so programming-intensive, MIS department staff usually restricted information requests Therefore, users often resorted to manual methods to obtain needed information.

• Concurrency problem

It means using a same record at same time. This problem was common in file approach but can be controlled in DBMS.

• Application programs are dependent on the file format:

In file processing system the physical formats of the files are entered in the programs. The change in file means change in program and vice versa. No such problem in DBMS.

• Poor data security

All the files are stored in the flat form or text files. These files can be easily located and trapped because file approach, has no data security.

• Difficult to represent the complex objects:

Some the objects may be of variable length records can be computerized using this approach. DBMS has capability to handle fixed-length records as well as variable-length records.

• Can not support heavy databases:

The databases on the Internet can be handled by the files system - but DBMS like oracle is used for heavy data base applications. On the other hand the DBMS have following advantages.

• Difficulty in representing data from the user’s view

To create useful applications for the user, often data from various files must be combined. In file processing it was difficult to determine relationships between isolated data in order to meet user requirements.

PROBLEMS OF FILE PROCESSING

The file processing approach finally became too cumbersome, costly, and inflexible to supply the information needed to manage modem businesses. It was replaced by the database management approach. File processing systems had the following major problems:

• Data Redundancy

Independent data files included a lot of duplicated data; the same data was recorded

and stored in several files. This data redundancy caused problems when data had to

be updated, since separate file maintenance programs had to be developed and

coordinated to ensure that each file was properly updated. Unfortunately, a lot of

inconsistencies occurred among data stored in separate files.

• Lack of Data Integration

Having independent files made it difficult to provide end users with information for

ad hoc requests that required accessing data stored in several different files. Special

computer programs had to be written to retrieve data from each independent file. This

was so difficult, time-consuming, and costly for some organizations that it was

impossible to provide end users or management with such information.

• Data Dependence

In file processing systems, major components of the system - the organization of files,

their physical locations of storage hardware, and the application software used to

access those files — depended on one another in significant ways. Changes in the

format and structure of data and records in a file required that changes be made to all

of the programs that used that file. This program maintenance effort was a major

burden of file processing systems.

• Other Problems

It was easy for data elements to be defined differently by different end users and

applications. Integrity of the data was suspect because there was no control over their

use and maintenance by authorized end users.

Q.10. What are the various types of database uses?

Ans. Without user all o the above said components (data, hardware & software) are meaning less. User can collect the data, operate and handle the hardware. Also operator feeds the data and arranges the data in order by executing the software. Users are of mainly of four types. These are:

(a) Naïve user

Naïve user has no knowledge of database system and its any supporting software. These are used at the end form. These are like a layman, which have little bit knowledge or computer system. These users are mainly used for collecting the data on the notebooks or on the pre-deigned forms. An automated teller machine (ATMs) user are in these categories. Naïve user can work on any simple GUI base menu driven system. Internet using non-computer based person are in this form.

(b) End User or Data Entry Operators

Data entry operators are preliminary computer based users. The function of data entry operators are only to operate the computer (start! stop the computer) and feed or type the collected information (data) in menu driven application program and to execute it according to the analyst’ requirement. These user are also called On line users. These user communicate the database directly via an on line terminal or indirectly via a user interface. These users require certain amount of expertise in the computer programming language, but require complete knowledge of computer operations.

(c) Application programmer

He is also called simple programmer. The working of application programmer is to develop a new project i.e. program for a particular application or modify an existing program. Application programmer works according to some instructions given by database administrator (DBA). Application programmer can handle all the programming language like Fortran, Cobol, dbase etc.

(d) DBA (Data Base Administrator)

DBA is a major user. DBA either a single person or a group of persons. DBA is only the custodian of the business firm or organization but not the owner of the organization. As bank manager is the DBA of a bank, who takes care about the bank money and not use it. Only DBA can handle the information collected by end user and give the instructions to the application programmer for developing a new program or modifying an existing program. DBA is also called an overall controller of the organization. In computer department of a firm either system analysts or an EDP (Electronic Data Processing) Manager works as DBA. In other words DBA is the overall controller of complete hardware and software.

RESPONSIBILITIES OF DBA

As we know DBA is the overall commander of a computer system, so it has number of duties, but some of his/her major responsibilities are as follows:

DBA can control the data, hardware, and software and gives the instructions to the application programmer, end user and naive user.
DBA decides the information contents of the database. He decides the suitable database file structure for arrangement of data. He/She uses the proper DDL techniques.
DBA compiles the whole data in a particular order and sequence.
DBA decides where data can be stored i.e. take decision about the storage structure.
DBA decides which access strategy and technique should be used for accessing the data.
DBA communicates with the user by appropriate meeting, DBA co-operates with

user.

DBA also define and, apply authorized checks and validation procedures.
DBA also takes backup of the data on a backup storage device so that if data can be lost then it can be again recovered and compiled. DBA also recovers the damaged data.
DBA also changes the environment according to user or industry requirement and monitor the performance.
DBA should be good decision-maker. The decision taken by DBA should be correct, accurate & efficient.
DBA should have leadership quality.
DBA liaise with the user in the business to take confidence of the customer about availability of data.

Q11. Discuss the architecture of database management system.

Ans. DBMS ARCHITECTURE

There are many different framework have been suggested for the DBMS over the last several year. The generalized architecture of a database system is called ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements Committee) model.

In 1972, a final report about database is submitted by ANSI (American National Standard Institute) and SPARC (Standard Planning And Requirement Committee). According to this approach, three levels of a database system was suggested and they are:

• External view (Individual user view)

• Conceptual View (Global or community user view)

• Internal level (physical or storage view).

For the system to be usable, it must retrieve data efficiently. This concern has led to the design of complex data structures for the representation of data in the database. Since many database systems users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users’ interactions with the system.

These three views or levels of the architecture are as shown in the diagram as follows:

OBJECTIVES OF THREE LEVEL ARCHITECTURE

The database views were suggested because of following reasons or objectives of levels of a database:

1. Make the changes easy in database when some changes needed by environment.

2. The external view or user views do not depend upon any change made ii other view. For example changes in hardware, operating system or internal view should not change the external view.

3. The users of database should not worry about the physical implementation and internal working of database system.

4. The data should reside at same place and all the users can access it as per their requirements.

5. DBA can change the internal structure without effecting the user’s view.

6. The database should be simple and changes can be easily made.

7. It is independent of all hardware and software.

All the three levels are shown below

External/View level

The highest level of abstraction where only those parts of the entire database are included which are of concern to a user. Despite the use of simpler structures at the logical level, some complexity remains, because of the large size of the database. Many users of the database system will not be concerned with all this information. Instead, such users need to access only a part of the database. So that their interaction with the system is simplified, the view level of abstraction is defined. The system may provide many views for the same database.

Databases change over time as information is inserted and deleted. The collection of information stored in the database at a particular moment is called an instance of the database. The overall design of the database is called the database schema. Schemas are changed infrequently, if at all.

Database systems have several schemas, partitioned according to the levels of abstraction that we discussed. At the lowest level is the physical schema; at the intermediate level is the logical schema and at the highest level is a subschema.

The features of this view are

• The external or user view is at the highest level of database architecture.

• Here only one portion of database will be given to user.

• One portion may have many views.

• Many users and program can use the interested part of data base.

• By creating separate view of database, we can maintain security.

• Only limited access (read only, write only etc) can be provided in this view.

For example: The head of account department is interested only in accounts but in library information, the library department is only interested in books, staff and students etc. But all such data like student, books, accounts, staff etc is present at one place and every department can use it as per need.

Conceptual/Logical level

Database administrators, who must decide what information is to be kept in the database, use this level of abstraction. One conceptual view represents the entire database. There is only one conceptual view per database.

The description of data at this level is in a format independent of its physical representation. It also includes features that specify the checks to retain data consistence and integrity.

The features are:

• The conceptual or logical view describes the structure of many users.

• Only DBA can be defined it.

• It is the global view seen by many users.

• It is represented at middle level out of three level architecture.

• It is defined by defining the name, types, length of each data item. The create table

commands of Oracle creates this view.

• It is independent of all hardware and software.

Internal/Physical level

The lowest level of abstraction describes how the data are stored in the database, and what relationships exist among those data. The entire database is thus described in terms of a small number of relatively simple structures, although implementation of the simple structures at the logical level may involve complex physical-level structures, the user of the logical level does not need to be aware of this complexity.

The features are :

• It describes the actual or physical storage of data.

• It stores the data on hardware so that can be stored in optimal time and accessed

in optimal time.

• It is the third level in three level architecture.

• It stores the concepts like:

• B-tree and Hashing techniques for storage of data.

• Primary keys, secondary keys, pointers, sequences for data search.

• Data compression techniques.

• It is represented as

FILE EMP [

INDEX ON EMPNO

FIELD = {

(EMPNO: BYTE (4),

ENAME BYTE(25))]

Mapping between views

• The conceptual/internal mapping:

o defines conceptual and internal view correspondence

• specifies mapping from conceptual records to their stored counterparts

o An external/conceptual mapping:

• defines a particular external and conceptual view correspondence

• A change to the storage structure definition means that the conceptual/internal

mapping must be changed accordingly, so that the conceptual schema may remain

invariant, achieving physical data independence.

• A change to the conceptual definition means that the conceptual/external mapping

must be changed accordingly, so that the external schema may remain invariant,

achieving logical data independence.



Q. 12. Write a note on Database Language And Interfaces.

Ans. Some main types of languages and facilities are provided by DBMS.

1. Programming Language

2. Data Manipulation Language

3. Data Definition Language

4. Schema Description Language

5. Sub-Schema Description Language

6. SQL (Structured Query Language)

1. Programming Language

All the programming language like Cobol, Fortran, C, C++, Pascal etc. has syntax and semantics. These all have structured and logical structure, so these all commonly used to solve general and scientific problems. All the business-oriented problems can be solved by the three GL and Fourth Gt.

2. DML

Some language that gives instructions to the programming language and other languages is called data manipulation language (DML). This language creates interface (linkage) between user and application program. This is extension of the program of the language used to manipulate data in the database. DML involves’ retrieval of data from the database, insertion of new data into the database and deletion or modification of the existing data. Some data manipulation operations are also called QUERY’ or• QUERY OPERATIONS. A Query is a statement in DML that request the retrieval of data from the database i.e. to search the data according to the user requirement. The subset of the DML used to operate the query is known as Query Language. DML provides commands to select & retrieve data from the database. Commands used in the DML are to insert, to update & to delete the records. The commands have different syntax for different programming language. For example, Fortran, Cobol, C etc. provide such type of facility with the help of database management system. The data manipulation function provided by DBMS can be invoked in a application program directly by procedural calls or by processors statement. This procedure can be done by the compiler. The DML can become

procedural language according to the user requirement. If the DML is non-procedural than user will indicate only what is to be retrieved. In both the cases the DBMS optimize the exact answer by using DML.

3. DDL

Database management system provides a facility known as Data Definition Language or data description language (DDL). DDL can be used to define conceptual schema (Global) and also give some details about how to implement this schema in the physical devices used to store the data. The definition includes all the entity sets and their associated attributes as well as the relationship among the entities set. The definitions also have some constraints which are used in DML. DDL also have some meta-data (it is data about the data in database). Meta-data have data dictionary, directory, system catalog to describe data about data. The dictionary contains the information about the data stored in the database and it is consulted by DBMS before any data manipulation operations. The DBMS maintain the information on the file structure and also used some access method to access the data efficiently. DDL is used for the help of DML.

We can say that there is another language - Data Sub Language (DSL) which is the

combination of both DML and DDL.

DSL = DML + DDL

4. Schema Description Language (SDL) or Schema

It is necessary to describe the organization of the data in a formal manner. The logical and physical database descriptions are used by DBMS software. The complete and overall description of data is referred to as schema. The schema and subschema words are brought into DBMS by CODASYL (Conference on data system language committee) and also by the CODASYL’s database task group. Schema is also referred to as conceptual model or global view (community view) of data. Suppose a complete description of collected data having all classes and student data, all employees (teaching & non-teaching) data and other concept of data related to the college is called Schema of the college. We can say that we relate whole college data logically, which is called schema.

5 Sub Schema Description language

   The term schema is used to mean an overall chart of the data items, types and record type stored in a database. The term sub-schema refers to an application programmer’s view of data he uses. Sub-schema is the part of schema. Many different sub-schemas can be derived from one schema. An application programmer does not use whole data i.e. full schema, e.g. As in an organization, purchase-order for the maintenance department is the sub-schema of the whole schema description of the purchase department in the hole industry. Two or more than two application- programmers use the different sub-schemas. One person named A uses the sub-schema purchase-order whereas programmer B uses the sub-schema supplier. Their operations and views are different according to their own sub-schema but both combined these two sub-schemas on the basis of a common key.

6. Structured Query Language (SQL):

SQL organized with the system R. System R means it is relational language. SQL is also called Structure Query Language. This language was developed in 1974 at IBM’s San Jose Research Center. The purpose of this language is to provide such non-procedural commands which are used for validation of the data and for searching the data. By using this language we can do any query about the data. SQL is sometimes named by SQUARE language. This language was helpful for both DDL and DML for the system R. Some SQL are also called Relational languages and used in a commercial RDBMS. Some commonly used SQL are ORACLE, INGRES, SYBASE etc. SQL resembles relational algebra and relational calculus in a relational system approach.

DBMS INTERFACES

Types of interfaces provided by the DBMS include:

Menu-Based interfaces for Web Clients or Browsing

• Present users with list of options (menus)

• Lead user through formulation of request

• Query is composed of selection options from menu displayed by system.

Forms-Based Interfaces

• Displays a form to each user

• User can fill out form to insert new data or fill out only certain entries.

• Designed and programmed for naïve users as interfaces to canned transactions.

Graphical User Interfaces

• Displays a schema to the user in diagram form. The user can specify a query by manipulating the diagram. GUIs use both forms and menus.

Natural Language Interfaces

• Accept requests in written English or other languages and attempt to understand them.

• Interface has its own schema, and a dictionary of important words. Uses the schema and dictionary to interpret a natural language request.

Interfaces for Parametric Users

• Parametric users have small set of operations they perform.

• Analysts and programmers design and implement a special interface for each class of naïve users.

• Often a small set of commands included to minimize the number of keystrokes required. (I.e. function keys)

Interfaces for the DBA

• Systems contain privileged commands only for DBA staff.

• Include commands for creating accounts, setting parameters, authorizing accounts,

changing the schema, reorganizing the storage structures etc.

Q.13. Describe the Classification of Database Management Systems.

Ans. Categories of DBMS

DBMS (Database Management System)

It is software to manage many databases. A DBMS is a software component or logical tool to handle the databases. All the queries from user about the data stored in the database will be handled by DBMS. There are many DBMSs available in market like dBase, FoxBASE, FoxPro, Oracle, Unify, Access etc.

RDBMS (Relational Data Base Management System)

Each database system uses a approach to store and maintain the data. For this purpose three data models were developed like Hierarchical model, Network Model and Relational Model. In the hierarchical model the data were arranged in the form of trees, in network model the data was arranged in the form of pointers and network and in relational model the data was arranged in the form of tables. The data stored in the form tables is easy to stored, maintain and understand. Many DBMS has been developed using approach of hierarchical and network models. Any DBMS that uses the relational data model for data storage and modeling Is called RDBMS. In RDBMS we can create relations among tables and can access the information from tables - while tables store stored in separately file and may or may not have identical structures. The RDBMS is based upon the rules given by Dr. Codd known as Dr. Codd’s Rules.

HDBMS (Heterogeneous DBMS)

In RDBMS we store the information related to the same kind of data like student data, teacher data, employee data etc. In HDBMS we store the data in the database which is entirely different.

DDBMS (Distributed DBMS)

During 1950s & 1960s there was trend to use independent or decentralized system. There was a duplication of hardware and facilities. In a centralized database system, the DBMS & data reside at a single place and all the control & location is limited to a single location, but the PCs are distributed geographically. Distributed system is parallel computing using multiple independent computers communicating over a network to accomplish a common objective or task. The type of hardware, programming languages, operating systems and other resources may vary drastically. It is similar to computer clustering with the main difference being a wide geographic dispersion of the resources

For example an organization may have an office in a building and have many sub- buildings that are connected using LAN. The current trend is towards distributed systems. This is a centralized system connected to intelligent remote sites. Each remote site have own storage and processing capabilities - but in a centralized or network there is a single storage.

OODBMS (Object Oriented DBMS)

Object-Oriented Database Management Systems (OODBMSs) have been developed to support new kinds of applications for which semantic and content are represented more efficiently with the object model. Therefore, the OODBMSs present the two main problems:

• Impedance mismatch: It is basically due to two reasons. Firstly, the no suitable abstractions of the operating systems, so when a client object has to invoke a method that is offered by a server object, and both objects are not into the same address space, it is necessary to use the mechanisms that are offered by the operating system, and these mechanisms do not became proper to the object oriented paradigm since they are oriented to communicate processes. In order to solve this problem intermediate software is included (e.g. COM or CORBA).In the second place, an impedance mismatch is also caused every time that the object-oriented applications need to use the operating system services.

• Interoperability problem between object models: Although different system elements use the object-oriented paradigm, an interoperability problem can exist between them. So, an application implemented using the C++ language, with the C++ object model, can easily interact with its objects, but when it wants to use objects that have been created with another programming language or another object-oriented database an interoperability problem appears.

The programming LANGUAGES like C, FORTRAN, PASCAL & FORTRAN use the POP (Procedure Oriented Approach) to develop applications, but the current trend is towards OOP (Object Oriented Programming). The languages like C++, Java, Oracle, C# (C Sharp). Visual Basic 6 use this approach. Many databases have been developed that follows this approach (OI approach) like Oracle. So the DBMS which follow OOP approach is called OODBMS.

Q. 14. Explain the difference between physical and logical data independence.

Ans. One of the biggest advantages of database is data independence. It means we can change the conceptual schema at one level without affecting the data at other level. It means we can change the structure of a database without affecting the data required by users and program. This feature was not available in file oriented approach. There are two types of data independence and they are:

1. Physical data independence

2. Logical data independence

Data Independence The ability to modify schema definition in on level without affecting schema definition in the next higher level is called data independence. There are two levels of data independence:

  1. Physical data independence is the ability to modify the physical schema without

causing application programs to be rewritten. Modifications at the physical level are occasionally necessary to improve performance. It means we change the physical storage/level without affecting the conceptual or external view of the data. The new changes are absorbed by mapping techniques.

2. Logical data independence in the ability to modify the logical schema without

causing application program to be rewritten. Modifications at the logical level are necessary whenever the logical structure of the database is altered (for example, when money-market accounts are added to banking system).

Logical Data independence means if we add some new columns or remove some columns from table then the user view and programs should not changes. It is called the logical independence. For example: consider two users A & B. Both are selecting the empno and ename. If user B add a new column salary in his view/table then it will not effect the external view user; user A, but internal view of database has been changed for both users A & B. Now user A can also print the salary.

It means if we change in view then program which use this view need not to be changed.

Logical data independence is more difficult to achieve than is physical data independence, since application programs are heavily dependent on the logical structure of the data that they access.

Logical data independence means we change the physical storage/level without effecting the conceptual or external view of the data. Mapping techniques absorbs the new changes.

Q. 15. What is physical data independence?

Ans. Physical data independence is the ability to modify the physical schema without causing application programs to be rewritten. Modifications at the physical level are occasionally necessary to improve performance. It means we change the physical storage/level without affecting the conceptual or external view of the data. The new changes are absorbed by mapping techniques.

Q. 16. What do you mean by data redundancy?

Ans. Redundancy is unnecessary duplication of data. For example if accounts department and registration department both keep student name, number and address.

Redundancy wastes space and duplicates effort in maintaining the data.

Redundancy also leads to inconsistency.

Inconsistent data is data which contradicts itself - e.g. two different addresses for a given student number. Inconsistency cannot occur if data is represented by a single entry (i.e. if there is no redundancy).

Controlled redundancy

Some redundancy may be desirable (for efficiency). A DBMS should be aware of it, and take care of propagating updates to all copies of a data item.

This is an objective, not yet currently supported.

Q. 17. What do you man by database schema?

Ans. It is necessary to describe the organization, of the data in a formal manner. The logical and physical database descriptions are used by DBMS software. The complete and overall description of data is referred to as schema, The schema and subschema words are brought into DBMS by CODASYL (Conference on data system language1 committee) and also by the CODASYL’s database task group. Schema is also referred to as conceptual model or global view (community view) of data. Suppose a complete description of collected data having all classes and student data4 all employees (teaching & non-teaching) data and other concept of data related to the college is called Schema of the college. We can say that we relate whole college data logically, which is called schema.

Q. 18. Explain the distinctions among the terms primary key, candidate key and superkey.

Or

What is the significance of foreign key? Or What are the various keys?

Ans. Keys: As there are number of keys can be defined, but some commonly and mainly used keys are explained as below:

1. Primary Key

A key is a single attribute or combination of two or more, attributes of an entity that is used to identify one or more instances of the set. The attribute Roll # uniquely identifies an instance of the entity set STUDENT. It tells about student Amrita having address 101, Kashmir Avenue and phone no. 112746 and have paid fees 1500 on basis of Roll No. 15. The 15 is unique value and it gives unique identification of students So here Roll No is unique attribute and such a unique entity identifies called Primary Key. Primary key cannot be duplicate.

From the definition of candidate key, it should be clear that each relation must have at least one candidate key even if it is the combination of all the attributes in the relation since all tuples in a relation are distinct. Some relations may have more t one candidate keys.

As discussed earlier, the primary key of a relation is an arbitrarily but permanently selected candidate key. The primary key is important since it is the sole identifier for the tuples in a relation. Any tuple in a database may be identified by specifying relation name, primary key and its value. Also for a tuple to exist in a relation, it must be identifiable and therefore it must have a primary key. The relational data model therefore imposes the following two integrity constraints:

(a) No component of a primary key value can be null;

(b) Attempts to change the value of a primary key must be carefully controlled.

The first constraint is necessary because if we want to store information about some entity, then we must be able to identify it, otherwise difficulties are likely to arise. For example, if a relation

CLASS (STUNO, LECTURER, CNO)

has (STUNO, LECTURER) as the primary key then allowing tuples like

3123 NULL CP302

NULL SMITH CP302

is going to lead to ambiguity since the two tuples above may or may not be identical and the integrity of the database may be compromised. Unfortunately most commercial database systems do not support the concept of primary key and it would be possible to have a database state when integrity of the database is violated.

The second constraint above deals with changing of primary key values. Since the primary key is the tuple identifier, changing it needs very careful controls. Codd has suggested three possible approaches:

Method 1

Only a select group of users be authorised to change primary key values.

  Method 2

Updates on primary key values be banned. If it was necessary to change a primary key, the tuple would first be deleted and then a new tuple with new primary key value but same other values would be inserted. Of course, this does require that the old values of attributes be remembered and be reinserted in the database.

  Method 3

A different command for updating primary keys be made available. Making a distinction in altering the primary key and another attribute of a relation would remind users that care needs to be taken in updating primary keys.

2. Secondary Key

The ke1 which is not giving the unique identification and have duplicate infonna6o is called secondary key, e g in a STUDENT entity if Roll Number is the primary key, then Name of the student, address of the student, Phone number of the student and the fees paid by the student all are secondary keys. A secondary key is an attribute or combination of attributes that not be primary key and have duplicate data. In otherworlds secondary key is used after the identification of the primary key. Also we can identify the data from the combination of the secondary keys.

3. Super Key

If we add additional attributes to a primary key, the resulting combination would still uniquely identify an instance of the entity set Such keys are called super keys A primary key is therefore a minimum super key For example, if DOB (date of birth field or attribute) is the primary key, then by adding some additional information about the day of the month key in the DOB field, this field or attribute becomes more powerful and useful Such type of key is called super key Super key are less used in a small database file. Now these days it has less importance, but due to its feature, this key gives the complete description of the database.

4. Candidate Key

There may be two or more attributes or combination of attributes that uniquely identify an instance of an entity set These attributes or combination of attributes are called candidate keys. Candidate key also gives unique identification. Candidate key comes with primary key. A candidate is a combination of two or more attributes e.g. if Roll No. and student name are two different attributes then we combine these two attribute and form a single attribute Roll No. & Name, then this combination is the candidate key and it is unique and gives unique identification about a particular roll no. and about particular name.

5. Alternative Key

A candidate key which is not the primary key is called alternative key, e.g. if Roll No. and Name combination is the candidate key, then if Roll No, is the primary key, other key in the candidate key is Name. Name attribute work as the alternative key.

6 Foreign Key

Suppose there are some relations as: SP (S#, P#, QTY), relation S (S#, S Name, status, city) and relation P (P#, PName, Color, Weight, City). We know entity SP is defined as the relationship of the relation S and the relation P. These two relations has sand P# as the Primary Keys in relation S and P respectively, but in the relation SP we can take either # as the primary key or P# as the primary key. Suppose if we take P# as the primary key, then other primary key S# which is actually the primary key, but do not work as primary key in the relation SF is called the Foreign Key. If S# is the primary key then P# is the Foreign Key. Similarly in the relation ASSIGNMENT, attribute Emp #, Prod #, Job # are given and if S# and P# are the primary keys, then the Job # key is the Foreign Keys.

  Q. 19. What are the major functions of a database administrator?

Ans. RESPONSIBILITIES OF DBA

As we know DBA is the overall commander of a computer system, so it has number of duties, but some of his/her major responsibilities are as follows:

1. DBA can control the data, hardware, and software and gives the instructions to the application programmer, end user and naive user.

2. DBA decides the information contents of the database. He decides the suitable database file structure for arrangement of data. He/She uses the proper DDL techniques.

3. DBA compiles the whole data in a particular order and sequence.

4. DBA decides where data can be stored i.e. take decision about the storage structure.

5. DBA decides which access strategy and technique should be used for accessing the data.

6. DBA communicates with the user by appropriate meeting. DBA co-operates with user.

7. DBA also define and apply authorized checks and validation procedures.

8. DBA also takes backup of the data on a backup storage device so that if data can then lost then it can be again recovered and compiled. DBA also recovers the damaged data.

9. DBA also changes the environment according to user or industry requirement and monitor the performance.

10. DBA should be good decision-maker. The decision taken by DBA should be correct, accurate & efficient.

11. DBA should have leadership quality.

12. DBA liaise with the user in the business to take confidence of the customer about the availability of data.

  Q. 20. What do you mean by relationships? Explain different types of relationships.

Ans. Relationships: One table (relation) may be linked with another in what is known as a relationship. Relationships may be built into the database structure to facilitate the operation of relational joins at runtime.

A relationship is between two tables in what is known as a one-to-many or parent-child or master-detail relationship where an occurrence on the ‘one’ or ‘parent’ or ‘master’ table may have any number of associated occurrences on the ‘many’ or ‘child’ or ‘detail’ table. To achieve this, the child table must contain fields which link back the primary key on the parent table. These fields on the child table are known as a foreign key, and the parent table is referred to as the foreign table (from the viewpoint of the child).
It is possible for a record on the parent table to exist without corresponding records on the child table, but it should not be possible for an entry on the child table to exist without a corresponding entry on the parent table.
A child record without a corresponding parent record is known as an orphan.
It is possible for a table to be related to itself. For this to be possible it needs a foreign key which points back to the primary key. Note that these two keys cannot be comprised of exactly the same fields otherwise the record could only ever point to itself.
A table may be the subject of any number of relationships, and it may be the

parent in some and the child in others.

Some database engines allow a parent table to be linked via a candidate key, but if this were changed it could result in the link to the child table being broken.
Some database engines allow relationships to be managed by rules known as referential integrity or foreign key restraints. These will prevent entries onchild tables from being created if the foreign key does not exist on the parent table, or will deal with entries on child tables when the entry on the parent table is updated or deleted.

Relational Joins

The join operator is used to combine data from two or more relations (tables) in order to satisfy a particular query. Two relations may be joined when they share at least one common attribute. The join is implemented by considering each row in an instance of each relation. A row in relation R1 is joined to a row in relation R2 when the value of the common attribute(s) is equal in the two relations. The join of two relations is often called a binary join.

The join of two relations creates a new relation. The notation ‘R1 x R2’ indicates the join of relations R1 and R2. For example, consider the following:

Note that the instances of relation RI and R2 contain the same data values for attribute B. Data normalisation is concerned with decomposing a relation (e.g. R(A,B,C,D,E) into smaller relations (e.g. R1 and R2). The data values for attribute B in this context will be identical in R1 and R2. The instances of R1 and R2 are projections of

the instances of R(A,B,C,D,E) onto the attributes (A,B,C) and (B,D,E) respectively. A projection will not eliminate data values duplicate rows are removed, but this will not remove a data value from any attribute.

The join of relations RI and R2 is possible because B is a common attribute. The result of the join is:

The row (2 4 5 7 4) was formed by joining the row (2 4 5) from relation R1 to the row (4 7 4) from relation R2. The two rows were joined since each contained the same value for the common attribute B. The row (2 4 5) was not joined to the row (6 2 3) since the values of the common attribute (4 and 6) are not the same.

The relations joined in’ the preceding example shared exactly one common attribute. However, relations may share multiple common attributes. All of these common attributes must be used in creating a join. For example, the instances of relations R1 and R2 in the following example are joined using the common attributes B and C:

The row (6 1 4 9) was formed by joining the row (6 1 4) from relation R1 to the row

(1 4 9) from relation R2. The join was created since the common set of attributes (B and

C) contained identical values (1 and 4). The row (6 1 4) from R1 was not joined to the

row (1 2 1) from R2 since the common attributes did not share identical values - (1 4) in

R1 and (1 2) in R2.

The join operation provides a method for reconstructing a relation that was decomposed into two relations during the normalisation process. The join of two rows, however, can create a new row that was not a member of the original relation. Thus invalid information can be created during the join process.

Now suppose that a list of courses with their corresponding room numbers is required. Relations R1 and R4 contain the necessary information and can be joined using the attribute HOUR. The result of this join is:
This join creates the following invalid information (denoted by the coloured rows):

• Smith, Jones, and Brown take the same class at the same time from two different instructors in two different rooms.

• Jenkins (the Maths teacher) teaches English.

• Goldman (the English teacher) teaches Maths.

• Both instructors teach different courses at the same time.

Another possibility for a join is R3 and R4 (joined on INSTRUCTOR). The result would be:

   This join creates the following invalid information:

• Jenkins teaches Math I and Algebra simultaneously at both 8:00 and 9:00.

A correct sequence is to join R1 and R3 (using COURSE) and then join the resulting relation with R4 (using both INSTRUCTOR and HOUR). The result would be:

Extracting the COURSE and ROOM attributes (and eliminating the duplicate row produced for the English course) would yield the desired result:
The correct result is obtained since the sequence (R1 x r3) x R4 satisfies the lossless (gainless?) join property

A relational database is in 4th normal form when the lossless join property can be used to answer unanticipated queries. However, the choice of joins must be evaluated carefully. Many different sequences of joins will recreate an instance of a relation. Some sequences are more desirable since they result in the creation of less invalid data during the join operation.

Suppose that a relation is decomposed using functional dependencies and multi- valued dependencies. Then at least one sequence of joins on the resulting relations exists that recreates the original instance with no invalid data created during any of the join operations.

For example, suppose that a list of grades by room number is desired. This question, which was probably not anticipated during database design, can be answered without creating invalid data by either of the following two join sequences:

The required information is contained with relations R2 and R4, but these relations

cannot be joined directly. In this case the solution requires joining all 4 relations.

The database may require a ‘lossless join’ relation, which is constructed to assure that any ad hoc inquiry’ can be answered with relational operators. This relation may contain attributes that are not logically related to each other. This occurs because the relation must serve as a bridge between the other relations in the database. For example, the lossless join relation will contain all attributes that appear only on the left side of a functional dependency. Other attributes may also be required, however, in developing the lossless join relation.

Consider relational schema R (A, B, C, D), A B and CD. Relations  and  are in 4th normal form. A third relation  however, is required to satisfy the lossless join property. This relation can be used to join attributes B and D. This is accomplished by joining relations R1 and R3 and then joining the result to relation

R2. No invalid data is created during these joins. The relation  is the lossless join relation for this database design.

A relation is usually developed by combining attributes about a particular subject or entity. The lossless join relation, however, is developed to represent a relationship among various relations. The lossless join relation may be difficult to populate initially and difficult to maintain - a result of including attributes that are not logically associated with each other.

The attributes within a lossless join relation often contain multi-valued dependencies. Consideration of 4th normal form is important in this situation. The lossless join relation can sometimes be decomposed into smaller relations by eliminating the multi-valued dependencies. These smaller relations are easier to populate and maintain.

Q. 21. What is an ER-diagram? Construct an ER diagram for a hospital with a set of patients and a set of doctors. Associate with each patient a log o1 the various tests and examinations conducted.

Or

Discuss in detail the ER diagram.

Or

What is one to many relationship? Give examples.

Or

Draw an ER diagram for a library management system, make suitable assumptions. Describe various symbols used in ER. diagram.

Or

Construct an ER diagram for a university registrar’s office. The office maintains data about each class, including the instructor, the enrollment and the time and place of the class meetings. For each student class pair, a grade is recorded also design a relational database for the said I.R. diagram.

Ans. E-R model grew out of the exercise of using commercially available DBMS to model application database. Earlier DBMS were based on hierarchical and network approach. E-R is a generalization of these models. Although it has some means of describing the physical database model, it is basically useful in the design of logical database model. This analysis is then used to organize data as a relation, normalizing relations and finally obtaining a relational database model.

The entity-relationship model for data uses three features to describe data. These are:

1. Entities, which specify distinct real-world items in an application.

2. Relationships, which connect entities and represent meaningful dependencies

between them.

3. Attributes, which specify properties of entities and relationships.

We illustrate these terms with an example. A vendor supplying items to a company, for example, is an entity. The item he supplies is another entity. A vendor supplying items are related in the sense that a vendor supplies an item. The act of supplying• defines a relationship between a vendor and an item. An entity set is a collection of similar entities. We can thus define a vendor set and an item set. Each member of an entity set is described by some attributes. For example, a vendor may be described by the attributes:

(vendor code, vendor name, address)

An item may be described by the attributes:

(item code, item name)

Relationship also can be characterized by a number of attributes. We can think of the relationship as supply between vendor and item entities: The relationship supply can be described by the attributes: (order no. date of supply)

Relationship between Entity Sets

The relationship between entity sets may be many-to-many (M: N), one-to-many (1: M), many-to-one (M: 1) or one-to-one (1:1). The 1:1 relationship between entity sets E1 and E2 indicates that for each entity in either set there is at most one entity in the second set that is associated with it. The 1: M relationship from entity set E1 to E2 indicates that for an occurrence of the entity from the set E1, there could be zero, one or more entities from the entity set E2 associated with it. Each entity in E2 is associated with at most one entity in the entity set E1. In the M: N relationship between entity sets E1 and E2, there is no restriction to the number of entities in one set associated with an entity in the other set. The database structure, employing the E-R model is usually shown pictorially using entity-relationship (E-R) diagram.

To illustrate these different types of relationships consider the following entity sets: DEPARTMENT, MANAGER, EMPLOYEE, and PROJECT

The relationship between a DEPARTMENT and a MANAGER is usually one-to- one; there is only one manager per department and a manager manages only one department. This relationship between entities is shown in Figure. Each entity is represented by a rectangle and the relationship between them is indicated by a direct line. The relationship for MANAGER to DEPARTMENT and from DEPARTMENT to MANAGER is both 1:1. Note that a one-to-one relationship between two entity sets does not imply that for an occurrence of an entity from one set at any time there must be an occurrence of an entity in the other set. In the case of an organization, there could be times when a department is without a manager or when an employee who is classified as a manager may be without a department to manage. Figure shows some instances of one-to-one relationships between the entities DEPARTMENT and MANAGER.

A one-to-many relationship exists from the entity MANAGER to the entity EMPLOYEE because there are several employees reporting to the manager. As we just pointed out, there could be an occurrence of the entity type MANAGER having zero occurrences of the entity type EMPLOYEE reporting to him or her. A reverse relationship, from EMPLOYEE to MANAGER, would be many to one, since many employees may be supervised by a single manager. However, given an instance of the entity set EMPLOYEE, there could be only one instance of the entity set MANAGER to whom that employee reports (assuming that no employee reports to more than one manager). The relationship between entities is illustrated in Figures shows some instances of this relationship.
Figure: Instances of 1: M Relationship

The relationship between the entity EMPLOYEE and the entity PROJECT can be derived as follows: Each employee could be involved in a number of different projects, and a number of employees could be working on a given project. This relationship between EMPLOYEE and PROJECT is many-to-many. It is illustrated in Figures shows some instances of such a relationship.

Figure: M : N Relationship

In the entity-relationship (E-R) diagram, entities are represented by rectangles, relationships by a diamond-shaped box and attributes by ellipses or ovals. The following

E-R diagram for vendor, item and their relationship is illustrated in Figure (a).

Representation of Entity Sets in the form of Relations

The entity relationship diagrams are useful in representing the relationship among entities they show the logical model of the database. E-R diagrams allow us to have an overview of the important entities for developing an information system and other relationship. Having obtained E-R diagrams, the next step is to replace each entity set and relationship set by a table or a relation. Each table has a name. The name used is the entity name. Each table has a number of rows and columns. Each row contains a number of the entity set. Each column corresponds to an attribute. Thus in the E-R diagram, the vendor entity is replaced by table below.

Table: Table For the Entity Vendor

The above table is also known as a relation. Vendor is the relation name. Each row of a relation is called a tuple. The titles used for the columns of a relation are known as relation attributes. Each tuple in the above example describes one vendor. Each element of a tuple gives specific property of that vendor. Each property is identified by the title used for an Attribute column. In a relation the rows may be in any order. The columns may also be depicted in any order. No two rows can be identical.

Since it is inconvenient to show the whole table corresponding to a relation, a more concise notation is used to depict a relation. It consists of the relation name and its attributes. The identifier of the relation is shown in bold face.

A specified value of a relation identifier uniquely identifies the row of a relation.

If a relationship is M: N, then the identifier of the relationship entity is a composite identifier, which includes the identifiers of the entity sets, which are related. On the other hand, if the relationship is 1:N, then the identifier of the relationship entity is the identifier of one of the entity sets in the relationship.. For example, the relations and identifiers corresponding to the E-R diagram of Figure are as shown:

Figure: E-R Diagram for Teacher, Student and their relationship

Teacher (Teacher-id, name, department, address)

Teaches (Teacher-id, Student-id)

Student (Student-id, name, department, address)

One may ask why an entity set is being represented as a relation. The main reasons

are case of storing relations as flat files in a computer and, more importantly, the existence of a sound theory on relations, which ensures good database design. The raw relations obtained as a first step in the above examples are transformed into normal relations. The rules for transformations called normalization are based on sound theoretical principles and ensure that the final normalized relations obtained reduce duplication of data, ensure that no mistake occur when data are added or, deleted and simplify retrieval of required data.

Q. 22. Discuss relational approach of database management system? Explain with the help of suitable relational operations to demonstrate insert, delete and update functions.

Or

What is relational model compare and contrast it with network and hierarchical model.

Ans. Database models are collection of conceptual tools for describing data

semantics and data constraints.

DBMS has number of ways to represent the data, But some important and commonly

used model are of four types, among which three are mainly used. These are:

I. Relational Model or Relational Approach

II. Hierarchical Model or Hierarchical Approach

III. Network Model or Network Approach

I. Relational Data Model

Relational Data Model has been developed from the research in deep and by testing and by trying through many stages. This model has advantages that it is simple to implement and easy to understand. We can express queries by using query language in this model. In this model relation is only constructed by setting the association among the attributes of an entity as well the relationship among different entities. One of the main reasons for introducing this model was to increase the productivity of the application programmers by eliminating the need to change application programmer, when a change is mode to the database. In this user need not know the exact physical structure. Data structure used in the data model represented by both entities and relationship between them. We can explain relation view of data on relational approach on the basis of following example.

Suppose there are three tables in which data is organized. These tables are Supplier tables or S table or S relation, Part table or P table or P relation, Shipment table of SP table or SP relation. The S table further has some fields or attributes. These are supplier number (S#), supplier name, status of the supplier and the city in which the supplier resides. Similarly P table has field part number (P#), part name, part color, weight of the part and location where the part is stored. Also SP table contains field supplier number (S#), part number (P#) and the quantity which supplier can ship. Each supplier s unique supplier number S# and similarly each part has unique part umber P#. These three tables are called relational table. S table is also called S-relation because it gives the relationship between different attributes. These attributes are field name and in the form of column. Rows of such table are called tuples. Pool of values in a particular w and attributes called domain. In other words domain is a pool of values from which actual value appearing in a given column are drawn. For example, in S table - S#, Sname, S-status are the attributes and s1, s2, s3 are domains. A relational table or relationship can be defined as:

Definition: A relation represented by table having n column, defined on domain Dl, D2, .... Dn is a subset of cartesian product DI x D2 x……x Dn.

Another definition is : It is collection of Dl, D2, D3,…….Dn then R is relation on these n sets if these n sets are ordered in n tuples such that each value of attribute belong to Dl, D2,…….Dn. These three relations are represented by diagram:

S table (Entity) or S Relation:

As in the .S table insertion, deletion and modifications can be done easily.

II. Hierarchical Model

It is a tree structure. It has one root and many branches, we call it parent child relationship. In this a single file has relation with many files and similarly we can say that it is the arrangement of individual data with group data. In an organization chart manager is the parent root and employees working under the manager are their children The representation of this model is expressed by linking different tables. Such type of representation is better for a linkage have many relationships with one. Some times it will create ambiguity in designing and defining the association and relationship between

SP table (Entity) or SP Relation:

In hierarchical approach, insertion can be done if a child has a parent and insertion on the child side is easy. Deletion and insertion is easy, but you can’t delete a parent: parent has one or more child. In the parent child relationship updation in parent and child both are difficult.

III. Network Approach

It is a complex approach of DBMS. In this we link all the records by using a chain or pointer. It has many to many relationships. Network approach is created when there are more than one relations in the database system. Network approach starts from on point and after connecting similar type of data it returns back to the same record.

Network approach is more symmetric than the hierarchical structure. In network model insertion at any point is very complex. We can insert only by creating a new record having linkage with other record. Similarly deletion is also complex if we delete any record than chain disconnect and whole structure vanish. Updation is also complex because we cannot change name or any data record because it connected with each other.

Difference between Relational, Hierarchical and Network Approaches:

(A) Relational Approach: Relational Approach (RA) has relationship between different entities and attribute in a particular entity. RA is in tabular form. RA

has one to one relationships. R-A has table in asymmetric form. Insertion, deletion,

updation in R table is very easy. Languages used in RA are SQL, Ingress, Oracle, Sybase. RA is simple in nature. Relational approach creates relationship between different entities and different attributes in the same entity. It is the best approach to represent the data than the other models.

(B) Hierarchical Approach: Hierarchical Approach (HA) creates a linkage between two or more entities. HA has parent child relationship. HA has one to many relationships. HA relationship is in symmetric form by defining parent and their child. Insertion, deletion, updation is little difficult than the RA. HA has IMS language, which is theoretical. It is Complex in nature.

(C) Network Approach: Network Approach (NA) has chain among many entities. NA has chaining technique or pointer technique. NA has many to many relationships. NA relationship is full or completely symmetric form because it has one chain symmetry. Insertion, deletion, updation is very difficult. NA has DBTG (Database Task Group) set hiving different classes & members. More complex than RA & HA.

  Q. 23. What is the usage of unified modelling language (UML)?

Ans. UML is a graphical language for visualizing, specifying, constructing and documenting an object oriented software-intensive system’s artifacts.

Q. 24. What are graphical user interfaces?

Ans. A graphical user interface (GUI) is sometime pronounced “gooey” is a method of interacting with a computer through a metaphor of direct manipulation of graphical images and widgets in addition to text. GUI display visual elements such as icon, Windows and other gadgets

Q. 25. Define the term dangling pointer.

Ans. The pointers that points to nothing is called dangling pointer.

Q. 26. Write a short note on Mapping.

Ans. Mappings

• The conceptual/internal mapping:

defines conceptual and internal view correspondence specifies mapping from conceptual records to their stored counterparts

• An external/conceptual mapping:

defines a particular external and conceptual view correspondence

• A change to the storage structure definition means that the conceptual/internal mapping must be changed accordingly, so that the conceptual schema may remain invariant, achieving physical data independence.

• A change to the conceptual definition means that the conceptual/external mapping must be changed accordingly, so that the external schema may remain invariant,

achieving logical data independence.

Q. 27. Distinguish between RDBMS and DBMS.

http://ptucse.loremate.com/dbms/node/2