Database Models

The application we are developing has to be thought of from the point of view of its main goal and function : it is a software package focused on storing and making available to users some data related to their field of activity : medicine. The other features can be ignored for the time being as they do not interfere with the discussion in this paragraph. It is obvious that the success of the application is tightly correlated with the most appropriate choice of data storage and retrieval techniques. When writing a small program that is not data intensive, the efficiency and execution time are not affected dramatically by some primitive way of storing data. On the other hand, in the case of a drug reference catalog that keeps track of more then ten thousands drugs, each having more fields of information, this choice is essential.
Much work and study has been done in the field of database models and there are many divergent views concerning the area. But one unanimous point of view, which can be perceived as a definition also, is done by Celko [1] who points out very clearly that a model is not the reality itself but rather a reduced and simplified version of it. When building a model, whether database or object oriented or any other software abstraction, the main point to keep in mind is that a model more complex than the thing it attempts to model is totally inefficient, useless and not reusable. The real world that is the subject of modeling is nothing more than a reference against which to verify the validity of the resulting database model.

The main database models that emerged from the very beginning of the field are summarized in [2]. We will shortly have a look at each one of them pointing out the advantages and disadvantages they offer, their concepts and best usage scenarios. This helps building the context in which Epocrates lies and presents the theoretical considerations that stand at its base.

The Hierarchical Model is basically a tree structure with hierarchical parent - child data segments. The parent - child relationship is one to many which introduces one big important restriction : a child segment can have only one parent segment. Hierarchical DBMS ( Database Management Systems ) were at their peak in the late 60's and throughout the 70's, starting with IBM's Information Management System.

The Network Model came to life in the same period as its predecessor as a solution to the limitation of the hierarchical model's parent - child relationships, being able now to represent data with more than one parent per child. It accomplish this by relating files in a parent - child manner, as owners and members, with the difference that each member file can have more than one owner. Another difference is the fact that the set construct has became the basic data modeling construct. We will not insist on these early obsolete models which are important nowadays only from the point of view of their principles.

The Relational Model is maybe the biggest step forward in the field of database design. It was formally introduced for the first time by Dr. E. F. Codd ( 1923 - 2003 ) in 1970 and has evolved since then grace to a series of writings that added improvements, yet keeping untouched Codd's principles. In the Theoretical Foundation chapter we will deal in detail with all the aspects of the relational model, since it will be the choice most appropriate for the purpose of this project. For now it is enough to say that the relational model is conceptually a mathematical model with its fundamental assumption that all data are represented as mathematical n-ary relations where each of these relations is nothing but a subset of the Cartesian product of the n sets. But the most important aspect is without doubt the fact that consistency in the logical representation of information can be easily achieved by using this model.

The general data model proposed by Codd in 1970 was embraced by many others, from which we will refer to Chris Date and Hugh Darwen [4] in their The Third Manifesto paper, published first in 1995 and being currently an ongoing research work. Just to mention, it has to be noticed that the third edition of the paper was updated last time in 28 May 2006. To sum up in couple of phrases their work ( which again will be the object of a more detailed analysis in the following chapter ), they showed how the relational model can implement object oriented features without altering its fundamental basis. They also emphasized the few valid ideas from object oriented modeling that do not cope with relational modeling. Chris Date even proposed in [5] a slightly different relational model based on the following: "No Duplicates, No NULL s". Currently, Chris Date and Hugh Darwen are widely regarded as the principal maintainers and developers of the relational model..
Being the common choice of businesses today, we'll give special attention to this model. We anticipate that the relational model will be the model used as a solution for the database tier of Epocrates. For a better understanding of its principles and theories surrounding it, we go back in time for a short history.