TOOLS FOR BUILDING INFORMATION SYSTEMS:PROGRAMMING LANGUAGES
PROGRAMMING LANGUAGES
Overview
This section describes how programming languages may be used to build information systems. First, a brief historical review of programming languages helps explain how programming tools have be- come more powerful and easier to use over the years. We characterize today’s modern programming languages, such as C++ and Visual Basic, by considering the advantages and disadvantages of each. Then we review some basic programming concepts by showing some typical examples from these languages. Finally, we consider how program development is affected when the World Wide Web is targeted as a platform. This includes a comparison of Internet programming tools, such as HTML, Java, and CGI scripting.
Historical Review
In the earliest days of program development, programmers worked directly with the computer’s own machine language, using a sequence of binary digits. This process was tedious and error-prone, so programmers quickly started to develop tools to assist in making programming easier for the humans involved. The first innovation allowed programmers to use mnemonic codes instead of actual binary digits. These codes, such as LOAD, ADD, and STORE, correspond directly to operations in the computer’s instruction set but are much easier to remember and use than the sequence of binary digits that the machines required. A programming tool, called an assembler, translates mnemonics into machine codes for the computer to execute. This extremely low-level approach to programming is hence referred to as assembly language programming. It forces the programmer to think in very small steps because the operations supported by most computers (called instruction sets) are very simple tasks, such as reading from a memory location or adding two numbers. A greater problem with this approach is that each computer architecture has its own assembly language, due to differ- ences in each CPU’s instruction set. This meant that early programmers had to recreate entire infor- mation systems from scratch each time they needed to move a program from one computer to another that had a different architecture.
The next major innovation in computer programming was the introduction of high-level languages in the late 1950s. These languages, such as COBOL and FORTRAN, allow a programmer to think in terms of larger, more complex steps than the computer’s instructions set allows. In addition, these languages were designed to be portable from machine to machine without regard to the underlying computer architecture. A tool called a compiler translates the high-level statements into machine instructions for the computer to execute. The compiler’s job is more complex than an assembler’s simple task of one-to-one translation. As a result, the machine code that is generated is often less efficient than that produced by a well-trained assembly programmer. For this reason, programming in assembly language is still done today when execution time is absolutely critical. The cost of developing and maintaining high-level code is significantly reduced, however. This is especially important given another trend in computing, towards cheaper and more powerful hardware and more expensive and harder-to-find programmers.
As high-level languages became more prevalent, programmers changed to a new development paradigm, called structured programming. This approach, embodied in languages such as Pascal and C from the 1970s and 1980s, emphasizes functional decomposition, that is, breaking large program- ming tasks into smaller and more manageable blocks, called functions. Data used within a function block is local to that function and may not generally be seen or modified by other blocks of code. This style of programming is better suited to the development of large-scale information systems because different functions can be assigned to different members of a large development team. Each function can be thought of as a black box whose behavior can be described without revealing how the work is actually being done but rather in terms of input and output. This principle of information hiding is fundamental to the structured programming approach. Another advantage is code reuse, achieved by using the more general-purpose functions developed for one project over again in other systems.
As the information systems being developed became more complex, development with structured programming languages became increasingly difficult. A new paradigm was required to overcome the problem, and it arrived in the late 1980s in the form of object-oriented programming (OOP). This continues to play an important role today. In structured programming, functions are the dominant element and data are passed around from one function to another, with each function having a dependency on the structure of the data. If the way the data are represented changes, each function that manipulates the data must, in turn, be modified. In OOP, data are elevated to the same level of importance as the functions. The first principle of object-oriented programming is called encapsu- lation. It involves joining data together with the functions that manipulate that data into an inseparable unit, usually referred to as a class. A class is a blueprint for actual objects that exist in a program. For example, the class Clock would describe how all clocks (objects or instances of the class) in a program will behave. It does not create any clocks; it just describes what one would be like if you built one, much as an architect’s blueprint describes what a building would look like if you built one.
The importance of encapsulation is seen when one considers the well-known Year 2000 problem encountered in many programs written with structured languages, such as COBOL. In many of these information systems, literally thousands of functions passed dates around as data, with only two digits reserved for storing the year. When the structure of the data had to be changed to use four digits for the year, each of these functions had to be changed as well. Of course, each function had to be identified as having a dependency on the date. In an OOP language, a single class would exist where the structure of the data representing a date is stored along with the only functions that may manipulate that data directly. Encapsulation, then, makes it easier to isolate the changes required when the structure of data must be modified. It strengthens information hiding by making it difficult to create a data dependency within a function outside a class.
OOP also makes code reuse easier than structured programming allows. In a structured environ- ment, if you need to perform a task that is similar but not identical to an existing function, you must create a new function from scratch. You might copy the code from the original function and use it as a foundation, but the functions will generally be independent. This approach is error-prone and makes long-term program maintenance difficult. In OOP, one may use a principle called inheritance to simplify this process. With this approach, an existing class, called the base class or superclass, is used to create a new class, called the derived class or subclass. This new derived class is like its parent base class in all respects except what the programmer chooses to make different. Only the differences are programmed in the new class. Those aspects that remain the same need not be developed from scratch or even through copying and pasting the parent’s code. This saves time and reduces errors.
The first OOP languages, such as SmallTalk and Eiffel, were rarely used in real world projects, however, and were relegated to university research settings. As is often the case with easing system development and maintenance, program execution speed suffered. With the advent of C++, an OOP hybrid language, however, performance improved dramatically and OOP took off. Soon thereafter, new development tools appeared that simplified development further. Rapid Application Development (RAD) tools speed development further by incorporating a more visual development environment for creating graphical user interface (GUI) programs. These tools rely on wizards and code generators to create frameworks based on a programmer’s screen layout, which may be easily modified. Ex- amples include Borland’s Delphi (a visual OOP language based on Pascal), Borland’s C++ Builder (a visual C++ language), and Microsoft’s Visual Basic.
C++
The C++ language was originally developed by Bjarne Stroustrup (Stroustrup 1994) but is now controlled and standardized by the American National Standards Institute (ANSI). It is an extension (literally, an increment) of the C programming language. C is well known for the speed of its compiled executable code, and Stroustrup strove to make C++ into a similarly efficient object-oriented language (see Stroustrup 1994 for a description of the development of the language). Technically, C++ is a hybrid language in that it supports both structured (from C) and object-oriented development. In order to achieve the speed that earlier OOP languages could not, C++ makes some sacrifices to the purity of the OO model. The tradeoffs required, however, helped make OOP popular with real-world de- velopers and sparked a revolution in modern program development.
Like C before it, C++ is a language with a rich abundance of operators and a large standard library of code for a programmer’s use. In its current form, C++ includes the Standard Template Library, which offers most of the important data structures and algorithms required for program development, including stacks, queues, vectors, lists, sets, sorts, and searches (Stroustrup 1997). Programs written using C++’s standard components are easily ported from one platform to another. The language lacks, however, a standard library of graphics routines for creating a graphical user interface program under different operating systems. Instead, each compiler vendor tends to offer its own classes and functions for interacting with a specific operating system’s windowing routines. For example, Microsoft offers the Microsoft Foundation Classes (MFC) for developing Windows appli- cations with its compiler, while Borland offers its Object Windows Library (OWL) for the same purpose. This makes it difficult to port GUI programs from one vendor’s compiler on one operating system to another vendor’s compiler on another operating system.
A simple example (Main and Savitch 1997) is shown below that declares a basic Clock class. It is an abstraction of a real-world clock used for telling time.
This class is typical of the structure of most classes in C++. It is divided into two sections, public and private. In keeping with information hiding, the data are normally kept private so that others outside of the class may not examine or modify the values stored. The private data are exclusively manipulated by the class functions, normally referred to as methods in OOP. The functions are declared in the public section of the class, so that others may use them. For example, one could ask a clock object its minute by using its get minute() function. This might be written as in the following code fragment:
Of course, this simple class provides a method for modifying the time stored. The public method set time() is used for this purpose. By forcing the use of this method, the class designer can ensure that the data are manipulated in accordance with appropriate rules. For example, the set time() method would not allow the storage of an illegal time value. This could not be guaranteed if the data were public and directly available for modification. The implementation of this method is shown below:
Note how the values passed to the method are tested via an assert() statement before they are used. Only if the assertion proves true are the values used; otherwise a runtime error message is produced.
To show an example of inheritance, consider extending the Clock class to create a new class, CuckooClock. The only difference is that this type of clock has a bird that chirps on the hour. In all other respects it is identical to a regular clock.
Note, we declare the new CuckooClock class in terms of the existing Clock class. The only methods we need to describe and implement are those that are new to or different from the base class. Here there is only one new function, called is cuckooing(), which returns true on the hour. It is important to understand that CuckooClock is a Clock. In important respects, an instance of CuckooClock can do anything that a Clock object can do. In fact, the compiler will allow us to send a CuckooClock object anywhere a Clock object is expected. For example, we might have a function written to compare two Clock objects to see if one is ‘‘equal to’’ the other.
We may confidently send a CuckooClock to this function even though it is written explicitly to expect Clock objects. Because of inheritance, we do not need to write a different version of the function for each class derived from Clock—CuckooClock is a Clock. This simplifies things considerably for the programmer. Another interesting note about this code segment is that C++ allows us to provide definitions for most of its built-in operators in the context of a class. This is referred to as operator overloading, and C++ is rare among programming languages in allowing this kind of access to operators. Here we define a meaning for the equality operator ( ) for Clock objects.In summary, C++ is a powerful and rich object-oriented programming language. Although widely recognized as difficult to work with, it offers efficient execution times to programmers that can master its ways. If you want portable code, you must stick to creating console applications. If this is not a concern, modern compilers assist in creating GUI applications through wizards, such as in Microsoft’s Visual C++. This is still more difficult to accomplish using C++ than a RAD tool such as Visual Basic, which will be discussed next.
Visual Basic
Visual Basic (VB) is a programming language and development tool from Microsoft designed pri- marily for rapidly creating graphical user interface applications for Microsoft’s Windows operating system. First introduced in 1991, the language is an extension of BASIC, the Beginners’ All-Purpose Symbolic Instruction Code (Eliason and Malarkey, 1999). BASIC has been in use since John Kemeny and Thomas Kurta introduced it in 1965 (Brookshear 1999). Over the years, Visual Basic has evolved more and more towards an object-oriented programming model. In its current version, 6.0, VB pro- vides programmers with the ability to create their own classes using encapsulation, but it does not yet support inheritance. It simplifies the creation of Windows applications by making the enormously complex Windows Application Program Interface (consisting of over 800 functions) available through easy-to-use objects such as forms, labels, command buttons, and menus (Eliason and Malarkey 1999).
As its name suggests, Visual Basic is a highly visual tool. This implies not only that the devel- opment environment is GUI-based but also that the tool allows you to design a program’s user interface by placing components directly onto windows and forms. This significantly reduces devel- opment time. Figure 1 provides a snapshot of the Visual Basic development environment.
Another feature that makes Visual Basic into a Rapid Application Development tool is that it supports both interpreted and compiled execution. When VB is used in interpreted mode, the tool allows the programmer to quickly see the effects of their code changes without a lengthy compilation directly to machine code. Of course, run-time performance is better when the code is actually com- piled, and this can be done before distributing the application.
In Visual Basic, objects encapsulate properties, methods, and events. Properties are an object’s data, such as a label’s caption or a form’s background color. Methods are an object’s functions, as in C++ and other OOP languages. Events are typically user-initiated actions, such as clicking a form’s button with the mouse or making a selection from a menu, that require a response from the object. Figure 1 shows some of the properties of the highlighted command button object, Command1. Here the button’s text is set to ‘‘Press Me’’ via the Caption property. In the code window for the form, the Click event’s code is displayed for this command button. This is where you would add the code to respond when the user clicks on the form’s button. The tool only provides the outline for the code, as seen here. The programmer must add actual VB statements to perform the required action. Visual Basic does provide many wizards, however, to assist in creating different kinds of applications, including forms that are connected back to a database. These wizards can significantly reduce de- velopment time and improve reliability.
The fundamental objects that are important to understand in Visual Basic development are forms
and controls. A form serves as a container for controls. It is the ‘‘canvas’’ upon which the programmer
‘‘paints’’ the application’s user interface. This is accomplished by placing controls onto the form. Controls are primarily GUI components such as command buttons, labels, text boxes, and images. In Figure 1, the standard toolbox of controls is docked on the far left of the screen. These controls are what the user interacts with when using the application. Each control has appropriate properties and events associated with it that affect its appearance and behavior. For example, consider the form shown in Figure 2.
This form shows the ease with which you can create a database-driven application. There are three controls on this form. At the bottom of the form is a data control. Its properties allow the programmer to specify a database and table to connect to. In this example, it is attached to a simple customer database. At the top are two textbox controls, used for displaying and editing textual in- formation. In this case, these controls are bound to the data control, indicating that their text will be coming from the associated database table. The data control allows the user to step through each record in the table by using the left and right arrows. Each time the user advances to another record from the table, the bound controls update the user’s text to display that customer’s name and address. In the code fragment below, you can see some of the important properties of these controls.
In the data control object, Data1, there are several properties to consider. The Connect property specifies the type of database being attached, here an Access database. The DatabaseName property specifies the name and location of the database. RecordSource specifies the table or stored query that will be used by the data control. In this example, we are connecting to the Customer table of the example database. For the bound text controls, the programmer need only specify the name of the data control and the field to be used for the text to be displayed. The properties that need to be set for this are DataSource and DataField, respectively.
Figure 2 Database Example.
As these examples demonstrate, Visual Basic makes it relatively easy to create complex GUI applications that run exclusively under the Windows operating system. When these programs are compiled into machine code, the performance of these applications is quite acceptable, although not yet as fast as typical C++ programs. The object-oriented model makes development easier, especially since most of the properties can be set visually, with the tool itself writing the necessary code. Unfortunately, VB still does not support inheritance, which limits a programmer’s ability to reuse code. Recent additions in the language, however, make it easier to target the Internet as an application platform. By using Active X controls, which run under Windows and within Microsoft’s Internet Explorer (IE) web browser, and VBScript (a subset of VB that runs in IE), you can create applications that may be ported to the World Wide Web. Your choice of controls is somewhat limited and the user must have Microsoft’s own web browser, but in a corporate Intranet environment (where the use of IE can be ensured) this might be feasible. For other situations a more flexible solution is required. The next section will explore some of the tools for achieving this.
Web-Based Programming
Developing applications where the user interface appears in a web browser is an important new skill for programmers. The tools that enable a programmer to accomplish this type of development include HTML, Java, CGI, Perl, ColdFusion, ASP, etc.—a veritable cornucopia of new acronyms. This section explains these terms and shows how these new tools may be used to create web pages that behave more like traditional software applications.
HTML
The HyperText Markup Language (HTML) is the current language of the World Wide Web. HTML is a markup language, not a programming language. Very simply, a document is ‘‘marked up’’ to define its appearance. This involves placing markup tags (or commands) around text and pictures, sound, and video in a document. The general syntax for HTML tags is:
These tags indicate how to display portions of the document. Opening tags, along with available options, are enclosed in < and > and closing tags include the / before the tagname. In some instances both opening and closing tags are required to indicate what parts of the documents the tags should affect. In other instances no closing tag is required. The contents of the file (including HTML tags and web page contents) that generates the simple web page shown in Figure 3 are as follows:
The latest version of HTML is 4.0 (http: / / www.w3.org / MarkUp / ), and several of today’s brows- ers support most of its features. In addition, the leading producers of web browsers, Microsoft and Netscape, frequently include support for proprietary features not specified in the W3C’s version of HTML. These extensions can make it difficult to create extremely complicated documents that will work on both browsers. This is especially true for dynamic HTML documents, or documents whose contents change even after the page is loaded into a browser (Castro, 1999). HTML 4.0 includes support for Cascading Style Sheets (http: / / www.w3.org / Style / ), which gives page designers easier control of large and complicated websites. HTML is essentially a system for formatting documents. It includes both structuring and formatting tags, which makes it difficult to maintain complex web- sites. The introduction of cascading style sheets allows a designer to separate structuring and for- matting tags. Formatting tags are maintained in cascading style sheets so that HTML is used only for structuring documents. Style sheets are very powerful but not as easy to use as basic HTML. The future, however, lies in Extensible Markup Language (XML) (http: / / www.w3.org / XML / ). XML is designed to allow groups to create their own markup languages, like HTML, specifically suited to their needs (Castro, 1999). It is especially important for its ability to put structured data into a text format. This might make it easy to view and edit a spreadsheet or database via the Web, for example. In fact, the W3C is rewriting HTML (as XHTML) and its Cascading Style Sheets (as XSL) using XML (see http: / / www.w3.org / TR / xhtml1 / and http: / / www.w3.org / Style / XSL / ).
HTML, by itself, does not provide the capabilities required for electronic commerce. For e- commerce, it is necessary to develop applications that are interactive in nature, permitting visitors to dynamically obtain information they need (e.g., search a product catalog) or complete some task (e.g., submit an order). Such capabilities typically require that user-provided data be processed in real time and require interaction with databases. This type of processing may be accomplished via programming languages such as Perl, Visual Basic, and C++. Recently, several tools have emerged that reduce the need for traditional programming while essentially achieving the same results (e.g., ColdFusion and Tango). These capabilities are often achieved via the Common Gateway Interface (CGI).
CGI
To create truly powerful websites, a programmer must be able to access current data, typically from a corporate database. For example, commerce sites need to be able to tell a user whether a product is in stock and record how many items a customer would like to purchase. This cannot be done with static HTML files. Instead, processing is required on the server-side before the contents of a web page are delivered to a user’s web browser. One widely used approach is the Common Gateway Interface (CGI). CGI specifies a common method for allowing Web pages to communicate with programs running on a Web server.
A typical CGI transaction begins when a user submits a form that he or she has filled out on a web page. The user may be searching for a book by a certain author at an online bookstore and have just entered the author’s name on a search page. The HTML used for displaying the form also specifies where to send the data when the user clicks on the submit button. In this case, the data is sent to a CGI program for processing. Now the bookstore’s CGI program searches its database of authors and creates a list of books. In order to return this information to the user, the CGI program must create a response in HTML. Once the HTML is generated, control may once again be returned to the web server to deliver the dynamically created web page. CGI programs may be written in nearly any programming language supported by the operating system but are often written in spe- cialized scripting languages like Perl.
CGI provides a flexible, but inefficient mechanism for creating dynamic responses to Web queries. The major problem is that the web server must start a new program for each request, and that incurs significant overhead processing costs. Several more efficient alternatives are available, such as writing programs directly to the Web server’s application program interface. These server APIs, such as Microsoft’s ISAPI and Netscape’s NSAPI, allow more efficient transactions by reducing overhead through the use of dynamic link libraries, integrated databases, and so on. Writing programs with the server APIs is not always an easy task, however. Other alternatives include using products like Allaire’s ColdFusion or Microsoft’s Active Server Pages (ASP).
Java
Java is a programming language from Sun Microsystems that is similar to C / C++ in syntax and may be used for general-purpose applications and, more significantly, for embedding programs in web pages (Hall 1998). Although Java is an extremely young language, it has gained widespread accep- tance because of its powerful and easy-to-use features. It is not yet a mature product, however, and experiences some stability problems as the language continues to evolve.
Java has a lot in common with C++, so much so that it is often referred to as ‘‘C++ Lite’’ by experienced programmers. It is an object-oriented programming language that was built from the ground up, unlike the hybrid C++. It is designed to be a cross-platform development tool; in other words, the code is easy to port from one computer architecture and operating system to another. It does a better job of this than C++ does by including a standard set of classes for creating graphical user interfaces. This allows a GUI application designed on a Macintosh to be run under Windows or Linux, for example. Extremely complex interfaces may still require tweaking to achieve complete interchangeability of the interface, but new additions to the language (the Swing classes) make this less necessary. Java is widely considered a simpler language to use than C++ because it forgoes some of the more troublesome C++ (though powerful) features. For example, Java does not explicitly use pointers and has automatic memory management, areas where C++ programmers often introduce bugs into their programs. This makes Java much easier to work with and somewhat less error-prone. Java only supports single inheritance, which means that a derived class may only have a single parent base class. C++ allows the use of the more powerful form, multiple inheritance, which allows several parents for a derived class. Java does not support templates, which allow C++ programmers to create functions and classes that work for many different types of data without needing to be implemented separately for each. Java also does not support operator overloading. Like C++, however, Java does have a rich and powerful set of standard libraries. Java’s libraries even include components for network programming and database access (through JDBC), which standard C++ lacks. Even with all these improvements to C++, Java would be a footnote in the annals of programming if it were not for its web features.
Java allows the creation of specialized applications called applets, which are designed to be run within a web browser. Most modern web browsers support Java in some form, and Sun offers plug- ins that provides the most up-to-date language features. Early applets focused on creating more dynamic web pages through animations and other graphical components easily created in Java but hard to reproduce in pure HTML. Today, applets are used for delivering entire software applications via the Web. This has many advantages over a traditional development and distribution model. For example, programmers do not have to worry about the end-user platform because applets run in a browser. Users don’t have to worry about applying patches and bug fixes because they always get the latest version from the Web. Security is an obvious concern for applets because it might be possible to write a malicious Java applet and place it on a web page for unsuspecting users. Luckily, the Java community has created a robust and ever-improving security model to prevent these situa- tions. For example, web browsers generally restrict applets from accessing memory locations outside their own programs and prevent writing to the user’s hard drive. For trusted applets, such as those being run from a company’s Intranet, the security restrictions can be relaxed by the end user to allow more powerful functionality to be included.
ColdFusion
ColdFusion is a tool for easily creating web pages that connect to a database. Developed by Allaire (http: / / www.allaire.com), ColdFusion is being used in thousands of Web applications by leading companies, including Reebok, DHL, Casio, and Siemens. The ColdFusion Markup Language (CFML) allows a user to create Web applications that are interactive and interface with databases. The syntax of CFML is similar to HTML, which is what makes it relatively easy to use; it makes ColdFusion appealing to Web designers who do not have a background in programming. ColdFusion encapsulates in a single tag what might take ten or a hundred lines of code in a CGI or ASP program. This allows for more rapid development of applications than competing methods provide. However, if a tag does
not do exactly what you need it to, then you must either change your application or resort to a programmatic approach. Although ColdFusion is not a programming language per se, knowledge of fundamental programming concepts and SQL is essential for sophisticated applications. CFML allows a user to display output based on the results of a query to a database and update databases based on user input. In order to use CFML, a ColdFusion Application Server (CFAS) must be installed. CFAS works in conjunction with a web server. The CFAS can do much more than query / update databases; it interfaces with many different Internet services to provide the functionality that a website designer desires. Figure 4 shows how a ColdFusion Application Server interacts with a web server, databases, and other services it provides.
The following sequence of actions illustrates how ColdFusion functions to produce dynamic results.
1. A user submits a form via a browser. The form is designed to be processed by a file containing CFML code. This file must have a ‘‘cfm’’ extension.
2. The web server recognizes the cfm extension and hands the file to the CFAS for processing.
3. The CFAS executes the instructions per the CFML code. This typically results in information from a database to be included in the web page. In addition, it may interact with other Internet services, such as other web servers, e-mail, etc.
4. The web page generated by the CFAS is then returned to the web server.
5. The web server sends this page to the browser.
Allaire offers versions that run under Microsoft Windows NT and the Unix-based operating sys- tems Solaris and HP-UX. ColdFusion programs interface with many different Web server APIs to provide greater efficiency than traditional CGI programs.
ASP
One of Microsoft’s methods for creating more dynamic web pages is through its web server’s support of Active Server Pages. ASP lets programmers use the VBScript programming language on the server- side to create HTML content on-the-fly. Because ASP is built into the Web server, it is more efficient than other CGI programs. It offers greater flexibility and control than a product like ColdFusion but requires greater programming knowledge to achieve results.
Comments
Post a Comment