
摘要
Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).今后大型数据库用户必须要知道如何保护组织的数据机(国内代表). A prompting service which supplies such information is not a satisfactory solution.一个提示服务用品等资料并非圆满解决. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed.活动用户终端和应用程序应不受影响,当大部分国内代表性数据的变化,即使在某些方面有代表性的外部变化. Changes in data representation will often be needed as' a result of changes in query, update, and report traffic and natural growth in the types of stored information.数据变化往往需要任职的变化导致查询、更新、公共交通和自然增长率类型储存的信息.
Existing non inferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data.现有不推理,为用户提供数据系统格式化树型档案或稍多一般网络模型数据. In Section 1 , inadequacies of these models are discussed.第一节,这些模式的不足讨论. A model based on n -ary relations, a normal form for data base relations, and the concept of a universal data sub language are introduced.基于示范n元关系,数据库为常态关系观念和语言介绍了通用数据分. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user's model.在第2、某些业务关系(逻辑推理以外)讨论问题,并应用于冗余和一致性用户的模式.
Key Words and Phrases关键词句
data bank, data base, data structure, data organization, hierarchies of data, network of data, relations, derivability, redundancy, consistency, composition, join, retrieval language, predicate calculus, security, data integrity资料库、数据库、数据结构、数据组织,等级数据、网络数据、关系,可导,冗余,一致性,组成参加,检索语言、上游积分、安全、数据完整性
1. Relational Model and Normal Form关联模型与范式
--------------------------------------------------------------------------------
1.1 Introduction1月1日实施
This paper is concerned with the application of elementary relation theory to systems which provide shared access to large banks of formatted data.本文是有关应用基础理论与系统提供共享进入大银行的数据格式. Except for a paper by Childs [1] , the principal application of relations to data systems has been to deductive question - answering systems.除了一张纸的疾病治疗[1]主要应用系统的数据关系一直演绎答疑系统. Levein and Maron [2] provide numerous references to work in this area.levein并提供众多个体[2]述这方面的工作.
In contrast, the problems treated here are those of data independence - the independence of application programs and terminal activities from growth in data types and changes in data representation Ñ and certain kinds of data inconsistency which are expected to become troublesome even in nondeductive systems.相比之下,这里是那些问题的处理数据的独立性--独立的应用程序和终端活动从增长数据类型和变化,某些类型的数据表达410-97数据不一致所预期的麻烦甚至成为nondeductive系统.
The relational view (or model) of data described in Section 1 appears to be superior in several respects to the graph or network model [ 3 , 4 ] presently in vogue for non-inferential systems.鉴于相关数据(或模型)第一节叙述似乎是在几个方面的优势还是网络模型图[3,4〕目前盛行的非推理系统. It provides a means of describing data with its natural structure only -- that is, without superimposing any additional structure for machine representation poses.它提供的数据描述手段自然只有结构--即无任何附加结构叠加构成机器代表性. Accordingly, it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and machine representation and organization of data on the other.据此,它提供了一个高层次的基础数据的语言发展将产生极大的独立节目之间,一方面组织代表和数据机等.
A further advantage of the relational view is that it forms a sound basis for treating derivability, redundancy, and consistency of relations - these are discussed in Section 2.再利用关系的看法是,它的健全基础治疗可导,裁员,关性和一致性,这些讨论在第2. The network model, on the other hand, has spawned a number of confusions, not the least of which is mistaking the derivation of connections for the derivation of relations (see remarks in Section 2 on the "connection trap" ).网络模型,在另一方面,也引起了一些混乱,最重要的当然是没有错的推导推导关系的联系(见第2话的"陷阱连接").
Finally, the relational view permits a clearer evaluation of the scope and logical limitations of present formatted data systems, and also the relative merits (from a logical standpoint) of competing representations of data within a single system.最后,关联观点明确许可范围和评价本格式化数据系统逻辑局限性,也是相对优点(从逻辑上来看)竞合交涉单独一个数据系统. Examples of this clearer perspective are cited in various parts of this paper.这一观点是清楚的例子多处引用本文. Implementations of systems to support the relational model are not discussed.施系统支持关系模型不讨论.
2月1日本系统数据依赖
The provision of data description tables in recently developed information systems represents a major advance toward the goal of data independence [ 5 , 6 , 7 ].提供数据描述统计表最近开发信息系统的一大目标前进数据独立性〔5,6,7〕. Such tables facilitate changing certain characteristics of the data representation stored in a data bank.这种改变某些特征表方便数据储存在数据库代表性. However, the variety of data representation characteristics which can be changed without logically impairing some application programs is still quite limited.但是各种数据所代表的特色是可以改变一些不损害逻辑应用程序还是相当有限. Further, the model of data with which users interact is still cluttered with representational properties, particularly in regard to the representation of collections of data (as opposed to individual items ).此外,数据模型与用户的互动仍杂乱与代表性物业尤其是在任职搜集(相对于个别项目). Three of the principal kinds of data dependencies which still need to be removed are: ordering dependence, indexing dependence, and access path dependence.3种主要的数据仍需要删除属地:订购依赖性,依赖度,准入和路径依赖. In some systems these dependencies are not clearly separable from one another.这些系统在一些属地划分不明确,互相离不开.
1.2.1. Ordering Dependence订购依赖性
Elements of data in a data bank may be stored in a variety of ways, some involving no concern for ordering, some permitting each element to participate in one ordering only, others permitting each element to participate in several orderings.数据资料库的内容可以储存多种方式,有些涉及不关心订购,有的允许每个元素只参加一个订货、允许他人参与每个元素序数. Let us consider those existing systems which either require or permit data elements to be stored in at least one total ordering which is closely associated with the hardwired-determined ordering of addresses.让我们看看那些现有制度要求或允许数据元素被储存在其中至少有一个共订购是与硬定订购地址. For example, the records of a file concerning parts might be stored in ascending order by part serial number.例如,关于部分档案记录可以储存部分序号排列. Such systems normally permit application pro- grams to assume that the order of presentation of records from such a file is identical to (or is a subordering of) the stored ordering.这种系统通常申请许可证亲克假定顺序介绍从这个档案记录相同(或者是subordering条)储存订购. Those application programs which take advantage of the stored ordering of a file are likely to fail to operate correctly if for some reason it becomes necessary to replace that ordering by a different one.这些应用程序所利用的档案存放有序运作,可能不正确,如果因为某些原因,所以才需要更换一个不同的排序. Similar remarks hold for a stored ordering implemented by means of pointers.类似的言论进行了实施存储订购方式指点.
It is unnecessary to single out any system as an example, because all the well-known information systems that are marketed today fail to make a clear distinction between order of presentation on the one hand and stored ordering on the other.这是不需要任何挑出制度为例因为所有著名的行销资讯系统今天无法划清顺序介绍,一方面对其他储存订购. Significant implementation problems must be solved to provide this kind of independence.执行必须解决的重大问题提供这种独立.
1.2.2. Indexing Dependence索引依赖性
In the context of formatted data, an index is usually thought of as a purely performance-oriented component of the data representation.在对格式化数据索引通常视之为纯粹表演为主组成的数据表达. It tends to improve response to queries and updates and, at the same time, slow down response to insertions and deletions.它趋于改善和更新,并回答问题,同时,放慢响应插入和缺失. From an informational standpoint, an index is a redundant component of the data representation.从信息角度来看,是多余的指标组成的数据表达. If a system uses indices at all and if it is to perform well in an environment with changing patterns of activity on the data bank, an ability to create and destroy indices from time to time will probably be necessary.如果各指标体系和用途,如果它要在一个环境良好的活动模式的改变与数据库、创造和毁灭能力指标不时可能需要. The question then arises: Can application programs and terminal activities remain invariant as indices come and go?于是衍生问题:终端和应用程序可以保持不变,作为活动指数出没?
Present formatted data systems take widely different approaches to indexing.格式化数据系统目前普遍采取不同方法索引. TDMS [7] unconditionally provides indexing on all attributes.tdms[7]无条件提供所有索引属性. The presently released version of IMS [5] provides the user with a choice for each file: a choice between no indexing at all (the hierarchic sequential organization) or indexing on the primary key only (the hierarchic indexed sequential organization).目前公布的版本管理系统[5]提供档案为每个用户提供一个选择:没有任何抉择索引(层次顺序组织)或索引的主要关键只(索引顺序组织的层次). In neither case is the user's application logic dependent on the existence of the unconditionally provided indices.在两宗个案是用户的应用逻辑依赖无条件提供的指标存在. IDS [8] , however, permits the file designers to select attributes to be indexed and to incorporate indices into the file structure by means of additional chains.[8]身份z,但设计师挑选许可证档案索引和属性将被纳入指数的结构方式附加档案链. Application programs taking advantage of the performance benefit of these indexing chains must refer to those chains by name.应用程序利用这些有利的表现一定是指那些索引链链的名字. Such programs do not operate correctly if these chains are later removed.这类节目并不正确,如果这些连锁店经营是后来拆除.
许多原有系统为用户提供数据格式树型档案或稍多一般网络模型数据. Application programs developed to work with these systems tend to be logically impaired if the trees or networks are changed in structure.应用软件系统开发工作,也往往是逻辑上的树木受损,如果有任何变化或网络结构. A simple example follows.一个简单的例子如下.
Suppose the data bank contains information about parts and projects.假设数据库载有零部件项目. For each part, the part number, part name, part description, quantity-on-hand, and quantity-on-order are recorded.每一部分的若干部分,第一部分的名字,说明部分,数量手头、数量按命令记录. For each project, the project number, project name, project description are recorded.每个投资项目数、项目名称、项目说明录音. Whenever a project makes use of a certain part, the quantity of that part committed to the given project is also recorded.每当一个项目借助于某一部分,数量承诺的那部分工程也给予记录. Suppose that the system requires the user or file designer to declare or define the data in terms of tree structures.假设用户的系统需要设计师或档案资料申报或确定在树结构. Then, any one of the hierarchical structures may be adopted for the information mentioned above (see Structures 1-5 ).届时,任何一个层次的结构,可通过上述资料(见结构1-5).
Now, consider the problem of printing out the part number, part name, and quantity committed for every part used in the project whose project name is "alpha."现在考虑的问题打印出若干部分,第一部分的名字,和数量承诺的一部分用于项目的每个项目的名字是"阿尔法" The following observations may be made regardless of which available tree-oriented information system is selected to tackle this problem.以下意见可不管是哪可树为本的信息系统是选择来解决这个问题. If a program P is developed for this problem assuming one of the five structures aboveÑthat is, P makes no test to determine which structure is in effect - then P will fail on at least three of the remaining structures.如果P是一个发达的计划之一,对于这个问题,假设五是搭建aboveñthat,磷没有测试,以确定它的结构实际上是那么至少会失败磷其余三人结构. More specifically, if P succeeds with structure 5 , it will fail with all the othersif P succeeds with structure 3 or 4 , it will fail with at least 1 , 2 , and 5 if P succeeds with 1 or 2 , it will fail with at least 3 , 4 , and 5 .具体来说,若P与结构5成,将无法与他人所有如果成功,与磷结构3或4,它将无法提供最少1,2、5条若P1或与成功2,它会失败至少3,4、5. The reason is simple in each case.原因很简单,在每一个个案. In the absence of a test to determine which structure is in effect, P fails because an attempt is made to execute a reference to a nonexistent file (available systems treat this as an error) or no attempt is made to execute a reference to a file containing needed information.在没有确定哪些结构是一个考验,实际上,磷因为是企图未能执行参考珠档案(可视之为一种系统误差)或没有试图做出执行档案载述所需资料. The reader who is not convinced should develop sample programs for this simple problem.读者应该发展谁不相信这个简单的抽样方案问题.
Since, in general, it is not practical to develop application programs which test for all tree structuring permitted by the system, these programs fail when a change in structure becomes necessary.因为,一般而言,这不是开发应用程序的实际测试树结构允许的所有系统这些方案未能在必要时改变结构.
Systems which provide users with a network model of the data run into similar difficulties.它为用户提供网络系统的数据模型碰到类似的困难. In both the tree and network cases, the user (or his program) is required to exploit a collection of user access paths to the data.在树上、网络案件用户(或其纲领)是利用收集用户所需的数据获取途径. It does not matter whether these paths are in close correspondence with pointer - (defined paths in the stored representation - in IDS the correspondence is extremely simple, in TDMS it is just the opposite. The consequence, regardless of the stored representation, is that terminal activities and programs become dependent on the continued existence of the user access paths.不管这些路径密切配合指针(在界定存放路径代表性--在IDS书信十分简单,在tdms是适得其反.后果,不管其存放代表性就是靠终端活动和节目成为用户接入的继续存在路径.
One solution to this is to adopt the policy that once a user access path is defined it will not be made obsolete until all application programs using that path have become obsolete.这是一个解决办法,采取的政策是,一旦用户访问路径确定它不会使用过时的应用程序,直至所有路径已经过时. Such a policy is not practical, because the number of access paths in the total model for the community of users of a data bank would eventually become excessively large.这种政策不符合实际,由于进出道路的总人数为示范社区用户资料库最终成为过大.
如果我们想存储一个由人组成的列表及人们的详细信息,我们是使用“Person”、“Persons”、“People”还是“Peoples”呢?有些人会用“People”,有些人会用“Person”,其他人或人们会用“Peoples”或“Persons”。
相关标准的规定是不进行复数化, 因为在一个表中,我们存储的是一组实体,我们按照该实体对表进行命名,所以如果我们想要在一个单独的实体或表中存储一个或更多的人, 我们就需要将他或他们存储在“Person”表中。
如果我们坚持这样做,那么它会使其他情况变得更简单,并使我们不再需要考虑如何复数化一个单词,例如,我曾看到将hierarchy复数化为"hierarcys"的。
需要证据来支持你和同事之间的争论吗?
如果我们看一下由E. F. Codd撰写的“大型共享数据库的数据关系模型”论文, 我们就会发现基本上是他发明了关系型数据库,他给出的例子是使用单数形式(供应商和组件)。
如果我们再看看ISO关于命名的标准(11179-5: 命名和识别原则),就会发现它也规定了单数名称应该“名词只以单数形式使用”。
最后,如果我们看看微软的例子,我们就可以看到有些困惑是从何而来!
旧的“pubs(示例数据库)”例子(https://github.com/microsoft/sql-server-samples/blob/master/samples/databases/northwindpubs/instpubs.sql )中混合使用了单数和复数命名。
“Northwind(另一种示例数据库)”也使用了一种混合命名。(https://raw.githubusercontent.com/Microsoft/sql-server-samples/master/samples/databases/northwind-pubs/instnwnd.sql )——注意,这里边有“Region”和“Territories”,我永远也不会知道它们的一致性在哪里!
Wide WorldImporters(微软SQL示例数据库)也使用了复数和单数混合的命名方法,有趣的是该示例数据库中使用是“Archive”的单数版本,而字典显示“Archives”适合单数或复数版本,所以它们很容易(意外地?)就已经具有一致性了。
这就很容易看出混淆从何而来,而且很有可能你将使用旧的数据库,所以请尝试与其他代码和团队保持一致。如果你必须使用复数名称,那么你需要定义什么时候在名称末尾加 “s”,以及哪种单词要使用“ies”或其他情况。
对于新项目或者你可以轻松更改实体名称的地方,那么我建议你必须使用单数名称,对于较老的项目,那么你需要更实际一点!
参考资料:
“大型共享数据库的数据关系模型”:
https://cs.uwaterloo.ca/~david/cs848s14/codd-relational.pdf
“IS0 111779” 第5节:
http://metadata-standards.org/11179/#11179-5
E.F.Codd 是关系数据库的鼻祖。首次提出了数据库系统的关系模型,开创了数据库关系方法和关系数据理论的研究。为数据库技术奠定了理论基础。由于他的杰出贡献,于1981年获得ACM图灵奖。 图灵奖是计算机界的最高奖项,相当于其他学科的诺贝尔奖。
在数据库技术发展的历史上,1 9 7 0 年是发生伟大转折的一年。这一年的6 月,I B M 圣约瑟研究实验室的高级研究员埃德加·考特 (Edgar Frank Codd) 在Communications of ACM 上发表了《大型共享数据库数据的关系模型》一文。A C M 后来在1 9 8 3 年把这篇论文列为从 1 9 5 8 年以来的2 5 年中最具里程碑意义的2 5 篇论文之一,因为它首次明确而清晰地为数据库系统提出了一种崭新的模型, 即关系模型。 “关系”( r e l a t i o n ) 是数学中的一个基本概念,由集合中的任意元素所组成的若干有序偶对表示, 用以反映客观事物间的一定关系。如数之间的大小关系、人之间的亲属关系、商品流通中的购销关系等等。在自然界和社会中, 关系无处不在; 在计算机科学中, 关系的概念也具有十分重要的意义。计算机的逻辑设计、编译程序设计、算法分析与程序结构、信息检索等,都应用了关系的概念。而用关系的概念来建立数据模型,用以描述、设计与 *** 纵数据库,考特是第一人。
由于关系模型既简单、又有坚实的数学基础, 所以一经提出, 立即引起学术界和产业界的广泛重视,从理论与实践两方面对数据库技术产生了强烈的冲击。在关系模型提出之后,以前的基于层次模型和网状模型的数据库产品很快走向衰败以至消亡,一大批商品化关系数据库系统很快被开发出来并迅速占领了市场。其交替速度之快、除旧布新之彻底是软件史上所罕见的。基于7 0 年代后期到8 0 年代初期这一十分引人注目的现象,1 9 8 1 年的图灵奖很自然地授予了这位“关系数据库之父”。在接受图灵奖时, 他做了题为“关系数据库:提高生产率的实际基础”的演说。(刊于1982 年2 月的C o m m u n i c a t i o n s o f A C M 第1 0 9 至第1 1 7 页,或见《A C M图灵奖演说集》第3 9 1 至第4 1 0页。)
考特原是英国人,1 9 2 3 年8 月1 9 日生于英格兰中部的港口城市波特兰。第二次世界大战爆发以后,年轻的考特应征入伍在皇家空军服役,1 9 4 2 至1 9 4 5 年期间任机长,参与了许多重大空战,为反法西斯战争立下了汗马功劳。二战结束以后,考特上牛津大学学习数学,于1 9 4 8 年取得学士学位以后到美国谋求发展。他先后在美国和加拿大工作,参加了I B M 第一台科学计算机7 0 1 以及第一台大型晶体管计算机 S T R E T C H 的逻辑设计,主持了第一个有多道程序设计能力的 *** 作系统的开发。他自觉硬件知识缺乏,于是在6 0 年代初,到密歇根大学进修计算机与通信专业( 当时他已年近4 0 ) ,并于1 9 6 3 年获得硕士学位, 1 9 6 5 年取得博士学位。这使他的理论基础更加扎实,专业知识更加丰富。加上他在此之前十几年实践经验的积累,终于在1 9 7 0 年迸发出智慧的闪光,为数据库技术开辟了一个新时代。
由于数据库是计算机各种应用的基础,所以关系模型的提出不仅为数据库技术的发展奠定了基础,同时也成为促进计算机普及应用的极大推动力。在考特提出关系模型以后,I B M 投巨资开展关系数据库管理系统的研究,其“S y s t e m R”项目的研究成果极大地推动了关系数据库技术的发展,在此基础上推出的D B 2 和S Q L 等产品成为I B M 的主流产品。S y s t e m R本身作为原型并未问世,但鉴于其影响,ACM还是把1988 年的 “软件系统奖”授予了S y s t e m R开发小组( 获奖的6 个人中就包括1 9 9 8 年图灵奖得主J . G r a y )。这一年的软件系统奖还破例同时授给两个软件,另一个得奖软件也是关系数据库管理系统,即著名的I N G R E S 。
1 9 7 0 年以后,考特继续致力于完善与发展关系理论。1 9 7 2 年,他提出了关系代数和关系演算的概念, 定义了关系的并、交、投影、选择、连接等各种基本运算, 为日后成为标准的结构化查询语言(S Q L )奠定了基础。
考特还创办了一个研究所(关系研究所)和一家公司(C o d d &A s s o c i a t i o n s),他本人是美国国内和国外许多企业的数据库技术顾问。1 9 9 0 年,他编写出版了专著《数据库管理的关系模型: 第二版》, 全面总结了他几十年的理论探索和实践经验。
考特是最早提出数据库OLAP概念的科学家。考特在提出OLAP概念的时候指出OLAP必须满足以下的12条规则: 有多维度的视角。 对用户透明。 访问性好。 提供报告的性能要稳定,不能因为维度的增加而变差。 采用客户端/服务器架构。 数据的每个维度都相当。 对稀疏矩阵有动态优化功能。 多用户支持。 对于跨域的计算不做任何限制。 直观的数据 *** 作。 灵活的报告体系。 任意多的维度和维度集合。
欢迎分享,转载请注明来源:内存溢出
微信扫一扫
支付宝扫一扫
评论列表(0条)