TOOLS FOR BUILDING INFORMATION SYSTEMS:DATABASE MANAGEMENT TOOLS

DATABASE MANAGEMENT TOOLS

DBM Concepts

One of the most important functions that an information system must perform is data management (i.e., record keeping). Prior to the advent of true database software, computer data management was characterized by excessive data redundancy (or duplication), large data dependence (or coupling),

and diffused data ownership (McLeod 1998). In early data processing systems, the data ﬁles were often created with ‘‘little thought as to how those ﬁles affected other systems. Perhaps much, or even all, of the new data in a new ﬁle was already contained in an existing ﬁle’’ (McLeod 1998). This resulted in signiﬁcant data redundancy and several associated problems. Obviously, data duplication results in wasted space, but worse than this are synchronization problems. ‘‘Multiple copies of data were often updated on different frequencies. One ﬁle would be updated daily, another weekly, and a third monthly. Reports based on one ﬁle could conﬂict with reports based on another, with the user being unaware of the differences’’ (McLeod 1998). These early systems also exhibited a high degree of dependence between the data speciﬁcations and the programs written to manipulate the data. Every time the representation of the data or the ﬁle storage format changed, all of the programs that depended on that data would also need to be changed. Other problems resulted from a lack of agreement on data ownership and standardization. ‘‘This situation of the 1950s and early 1960s may have been the reason why many of the ﬁrst management information systems failed in the late 1960s. Information specialists appeared unable to deliver systems that provided consistent, accurate, and reliable information’’ (McLeod 1998).

Database systems sought to address these issues by separating the logical organization of the data from the physical organization. The logical organization is how the user of the system might view the data. The physical organization, however, is how the computer views the data. A database, then, may be deﬁned as ‘‘an integrated collection of computer data, organized and stored in a manner that facilitates easy retrieval’’ by the user (McLeod 1998). It seeks to minimize data redundancy and data dependence. A database is often considered to consist of a hierarchy of data: ﬁles, records, and ﬁelds. A ﬁeld is the smallest organized element of data, consisting of a group of characters that has a speciﬁc meaning. For example, a person’s last name and telephone number would be common ﬁelds. Related ﬁelds are collected and stored in a structure referred to as a record. A speciﬁc person’s record might consist of his or her name, address, telephone number, and so on (all values associated with that individual). Related records are grouped into a ﬁle, such as a ﬁle of employees or customers. Keep in mind that the physical representation of the database will be shielded from the user and should not limit the user’s ability retrieve information from the system.

A database management system (DBMS) is a ‘‘collection of programs that manage the database structure and controls access to the data stored in the database. The DBMS makes it possible to share the data in the database among multiple applications or users’’ (Rob and Coronel 1997). In a sense, it sits between the user and the data, translating the users’ requests into the necessary program code needed to accomplish the desired tasks. General Electric’s Integrated Data Store (IDS) system, intro- duced in 1964, is considered by most to be the ﬁrst generalized DBMS (Elmasri and Navathe 2000).

Database systems are often characterized by their method of data organization. Several popular models have been proposed over the years. Conceptual models focus on the logical nature of the data organization, while implementation models focus on how the data are represented in the database itself. Popular conceptual models include the entity-relationship (E-R) model and the object-oriented model discussed below. These models typically describe relationships or associations among data as one-to-one, one-to-many, or many-to-many. For example, in most organizations an employee is as- sociated with one department only but a department may have several employees. This is a one-to- many relationship between department and employee (one department has many employees). But the relationship between employee and skills would be many-to-many, as a single employee might pos- sess several skills and a single skill might be possessed by several employees (many employees have many skills).

Implementation models include the hierarchical model, network model, and relational model. Each model has advantages and disadvantages that made it popular in its day. The relational model is easily understood and has gained widespread acceptance and use. It is currently the dominant model in use by today’s database management systems.

Relational DBMS

The relational database model, introduced by E. F. Cod in 1970, is considered a major advance for users and database designers alike. At the time, however, many considered the model impractical because its ‘‘conceptual simplicity was bought at the expense of computer overhead’’ (Rob and Coronel 1997). As processing power has increased, the cost of implementing the relational model has dropped rapidly. Now you will ﬁnd relational DBMS implementations from mainframes down to microcomputers in products like DB2, Oracle, and even Microsoft’s Access.

To the user, the relational database consists of a collection of tables (or relations in the formal language of the model). Each table may be thought of as a matrix of values divided into rows and columns. Each row represents a record and is referred to as a tuple in the model. Each column is a ﬁeld and is called an attribute in the model. Below are some simple tables for Employee and De- partment.

Pay special attention to how relationships between tables are established by sharing a common ﬁeld value. In this example, there is a one-to-many relationship between the Employee and Depart- ment tables established by the common DepartmentID ﬁeld. Here, Robert Adams and Mary Roberts are in the Engineering department (DepartmentID 20). The tables represented here are independent of one another yet easily connected in this manner. It is also important to recognize that the table structure is only a logical structure. How the relational DBMS chooses to represent the data physically is of little concern to the user or database designer. They are shielded from these details by the DBMS.

Processing

One of the main reasons for the relational model’s wide acceptance and use is its powerful and ﬂexible ad hoc query capability. Most relational DBMS products on the market today use the same query language, Structured Query Language (SQL). SQL is considered a fourth-generation language (4GL) that ‘‘allows the user to specify what must be done without specifying how it must be done’’ (Rob and Coronel 1997). The relational DBMS then translates the SQL request into whatever program actions are needed to fulﬁll the request. This means far less programming for the user than with other database models or earlier ﬁle systems.

SQL statements exist for deﬁning the structure of the database as well as manipulating its data. You may create and drop tables from the database, although many people use tools provided by the DBMS software itself for these tasks. SQL may also be used to insert, update, and delete data from existing tables. The most common task performed by users, however, is data retrieval. For this, the SELECT statement is used. The typical syntax for SELECT is as follows:

For example, to produce a list of employee names and telephone numbers from the Engineering department using the tables introduced earlier, you would write:

In this example, the rows returned by the query are established by matching the common values of DepartmentID stored in both the Employee and Department tables. These joins may be performed across several tables at a time, creating meaningful reports without requiring excessive data dupli- cation.

It is also possible to sort the results returned by a query by adding the ORDER BY clause to the SELECT statement. For example, to create an alphabetical listing of department names and IDs, you would write:

These examples demonstrate the ease with which SQL may be used to perform meaningful ad hoc queries of a relational database. Of course, SQL supports many more operators and statements than those shown here, but these simple examples give a hint of what is possible with the language.

Object-Oriented Databases

The evolution of database models (from hierarchical to network to relational) is driven by the need to represent and manipulate increasingly complex real-world data (Rob and Coronel 1997). Today’s systems must deal with more complex applications that interact with multimedia data. The relational approach now faces a challenge from new systems based on an object-oriented data model (OODM). Just as OO concepts have inﬂuenced programming languages, so too are they gaining in popularity with database researchers and vendors.

There is not yet a uniformly accepted deﬁnition of what an OODM should consist of. Different systems support different aspects of object orientation. This section will explore some of the similar characteristics shared by most of these new systems. Rob and Coronel (1997) suggest that, at the very least, an OODM

1. Must support the representation of complex objects

2. Must be extensible, or capable of deﬁning new data types and operations

3. Must support encapsulation (as described in Section 2)

4. Must support inheritance (as described in Section 2)

5. Must support the concept of object identity, which is similar to the notion of a primary key from the relational model that uniquely identiﬁes an object. This object identity is used to relate objects and thus does not need to use JOINs for this purpose.

Consider an example involving data about a Person (from Rob and Coronel 1997). In a typical relational system, Person might become a table with ﬁelds for storing name, address, date of birth, and so on as strings. In an OODM, Person would be a class, describing how all instances of the class (Person objects) will be represented. Here, additional classes might be used to represent name, address, date of birth, and so on. The Address class might store city, state, street, and so on as separate attributes. Age might be an attribute of a Person object but whose value is controlled by a method, or function of the class. This inclusion of methods makes objects an active component in the system. In traditional systems, data are considered passive components waiting to be acted upon. In an OODM, we can use inheritance to create specialized classes based on the common character- istics of an existing class, such as the Person class. For example, we could create a new class called Employee by declaring Person as its superclass. An Employee object might have additional attributes of salary and social security number. Because of inheritance, an Employee is a Person and, hence, would also gain the attributes that all Person objects share—address, date of birth, name, and so on. Employee could be specialized even further to create categories of employees, such as manager, secretary, cashier, and the like, each with its own additional data and methods for manipulating that data.

Rob and Coronel (1997) summarize the advantages and disadvantages of object-oriented database systems well. On the positive side, they state that OO systems allow the inclusion of more semantic information than a traditional database, providing a more natural representation of real-world objects. OO systems are better at representing complex data, such as required in multimedia systems, making them especially popular in CAD / CAM applications. They can provide dramatic performance im- provements over relational systems under certain circumstances. Faster application development time can be achieved through the use of inheritance and reuse of classes. These OO systems are not without problems, however. Most of them are based on immature technology, which can lead to product instability. This is exacerbated by the lack of standards and an agreed-upon theoretical foun- dation. OO database systems especially lack a standard ad hoc query language (like SQL for relational systems). In addition, the initial learning curve is steep, particularly for individuals who are well accustomed to the relational model. On the whole, it seems clear that object-oriented databases are

worth consideration. As standards develop over time and people become more familiar with object- oriented systems in general, OO databases could prove to be the successor to the relational model.

Data Warehouses

One of the major changes in the information systems area within the past decade is a recognition that there are two fundamentally different types of information systems in all enterprises: operational systems and informational systems. Operational systems are just what their name implies: they are the systems that help us run the enterprise day-to-day. These are the backbone systems of any enterprise: the order entry, inventory, manufacturing, payroll, and accounting systems. Because of their importance to the organization, operational systems were almost always the ﬁrst parts of the enterprise to be computerized. Over the years, these operational systems have been enhanced and maintained to the point that they are completely integrated into the organization. Indeed, most large enterprises around the world today couldn’t operate without their operational systems and the data that these systems maintain. Operational data normally is current (the present) and is focused on an area within the enterprise.

On the other hand, other functions go on within the enterprise that have to do with planning, forecasting, and managing the organization. These functions are also critical to the survival of the organization, especially in our fast-paced world. Functions like marketing planning, engineering plan- ning, and ﬁnancial analysis also require information systems to support them. But these functions are different from operational ones, and the types of systems and information required are also different. The information needed to aid such major decisions that will affect how the enterprise will operate, now and in the future, typically is obtained by analyzing operational data over a long time period and covering many different areas within an enterprise.

A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data to support management decision making. The data in a data warehouse are stored in a manner that will support the information and analytical processing needs of the organization over a long historical time period. They are typically physically separated from the operational databases that an organi- zation must maintain and update to function effectively on a day-to-day basis (Singh 1998; Barqu´ın and Edelstein 1997).

There are many data ﬂows from the operational databases to the data warehouse and from the data warehouse to users. Data must ﬂow from the enterprise’s legacy systems to a more compact form, then to the data warehouse storage, and ﬁnally out to consumers. The ﬂows necessary to manage the warehouse itself (metaﬂow) must also be considered (Hackathorn 1995). For each ﬂow, data warehouses require tools to make these processes function effectively (Mattison 1996).

The tools for developing data warehouses can be grouped into three categories, based on their activities: acquisition tools (for inﬂow), storage tools (for upﬂow and downﬂow), and access products (for outﬂow) (Mattison 1996). Acquisition tools are necessary to perform tasks such as modeling, designing, and populating data warehouses. These tools extract data from various sources (operational databases and external sources) and transform it (i.e., condition it, clean it up, and denormalize it) to make the data usable in the data warehouse. They also establish the metadata, where information about the data in the warehouse is stored (Francett 1994).

Storage is usually managed by relational DBMSs and other special tools in ways that enable the use of data for effective decision support (Mattison 1996). Access to warehouse contents is provided by data mining tools using such methods as neural networks and data discovery mechanisms (Mat- tison 1996), plus multidimensional analysis software that supports end users in accessing and ana- lyzing the data in the warehouse in various ways (Francett 1994). Data mining is the process of making a discovery from large amounts of detailed data (Barry 1995) by drilling down through detailed data in the warehouse. Data mining tools sift through large volumes of data to ﬁnd patterns or similarities in the data. Data mining and online analytical processing tools translate warehouse contents into business intelligence by means of a variety of statistical analyses and data visualization methods (Brown 1995; Fogarty 1994). Table 1 provides a list of some of the common terms used within the data warehouse community.

A number of commercial products attempt to fulﬁll warehousing needs, including the SAS Insti- tute’s SAS / Warehouse Administrator, IBM’s Visual Warehouse, Cognos’s PowerPlay, Red Brick Systems’ Red Brick Warehouse, and Information Builders’ FOCUS. A discussion of these tools is beyond the scope of this chapter.

Search This Blog

Operation Research course