thesis submitted to
Manas Sudhakar Hardas
I am most grateful to my advisor Dr. Javed I Khan for his encouragement and vision. Dr. Khan has the ability to make others envision ideas and inspire creativity. I would like to thank Dr. Khan for getting me interested in research and helping me at every stage.
I am also thankful for the members of the
MediaNet Lab, Nouman, Asrar, and Oleg for their support and advice. I would like
to thank my project mates Yongbin Ma and Sujatha Gnyaneshwaran who helped me in
my research and for contributing to my thesis, without which this effort
wouldn’t have been possible. I would like to thank my close friends and
roomates, Sajid, Siddharth,
I would like to thank my family whom I dearly miss and without whose blessings none of this would have been possible. I thank my mom and dad for always believing in me and standing by me through everything. Particularly I thank my grand mom and grand dad, my love and respect for whom, is beyond measure.
Semantic analysis of educational resources is a very interesting problem with numerous applications. Like any design process, educational resources also have basic elements of design and reproduction. In the process of designing test problems, these elements are in the form of information objects. Course knowledge can be represented in the form of prerequisite relation based ontology using which, assessment and information extraction from test problems is possible. We propose a language schema based on Web Ontology Language (OWL) for formally describing course ontologies. Using this schema, course ontologies can be represented in a standard and sharable way. An evaluation system acts as the backend for the design and re-engineering system. This research aims at automating the process of intelligent evaluation of test-ware resources by providing qualitative assessment of test problems. Some synthetic parameters for the assessment of a test problem in its concept space are introduced. The parameters are tested in some real world scenarios and intuitive inferences are deduced predicting the performance of the parameters. It is observed that the difficulty of a question is often a function of the knowledge content and complexity of the concepts it tests.
Introduction and Background
Since the advent of the Web a great
optimism has been created about online sharing of course material. Many educators
worldwide today maintain course websites with online accessible teaching
materials. The primary use of these web-sites is for dispensing lecture
materials to immediate students. There have been many organized attempts as well
to create large digital courseware libraries to promote sharing. Some of the
significant efforts in this direction are NIST Materials Digital Library
Pathway [2, 3], NSDL Digital Libraries in science, technology, engineering and
mathematics (STEM) , OhioLink , ACM Professional Development Centre with
over 1000 computer science courses , etc. Most universities, colleges and
even schools now actively encourage online course material dispensing through
portals. MIT’s Open Course Ware (OCW)
project  has more than 1000 course materials freely available, Universia 
maintains translated versions of OCW courses in 11 languages, China Open
Resources for Education (
Most courseware today, on the web or otherwise, is not accompanied with a conceptual design. Any composition, engineering design or courseware or an art work always takes place in the context of a conceptual design space. The conceptual context is the most important factor in any formative learning process. Consider a lecturer giving a presentation on some topic. If the lecturer simply talks about the presentation topic without giving any reference to a slide or a diagram, it is very difficult to understand. Conversely if the lecturer just presents the slides without explaining them in some context, the presentation remains incomplete. There is no well formed encoding principle for capturing and sharing this invisible design without which the course materials significantly loose much of their reusability. In desperate cases, teacher has to manually reverse engineer the design from the courseware. Therefore, it is not surprising that instructors and educators find it easier to build the course materials from scratch rather than reusing online available resources. The background design of the course material is vital for creation and reengineering of courseware. It is very unlikely that without the design, finished courseware available will ever be used creatively.
Currently the web is huge repository
of assorted digital resources without much reusability. Most educational
content is scattered, replicated and not linked to each other by any kind of
relationship. To make this digital content reusable, sharing the metadata
associated with it is necessary. A clear distinction has to be made between
knowledge and information. Knowledge is the means by which intelligent design
and sharing of test-ware and other web resources is possible. The main problem
of information on the web is that it is hardly machine usable . To make the
data on the web reusable it is necessary to have information about the data
itself. Thus the
Traditionally concepts maps are used to represent the backend context for the course knowledge. Many efforts [16, 17, 18, 19] have gone into representing course knowledge using concept maps. In the recent past ontologies progressively are being used to represent structured information in a hierarchical format. Concept maps offer a means to represent hierarchical knowledge; however they are too expressive and consequently contain more information and semantic relationships than necessary for effective computation. Ontologies provide a means to effectively map this knowledge into concept hierarchies. Course ontology, particularly, can be roughly defined as a hierarchical representation of the topics involved in the course, connected by relationships with specific semantic significance. Using ontologies for course concept hierarchies in the domain of education is only obvious. Currently the process of designing of test problems is completely manual, based on human experience and cognition. Design of test problems also follows the basic principles of any engineering design process. The primary elements of design in this case are the information objects. Much effort has been put in the creation and reusability of these information objects called as the learning objects. The Learning Object Metadata (LOM)  standard for the representation of information about educational resources is the product of this effort. Recent standardization of semantic representation standards like RDF and OWL offers great technical platform to represent the concept knowledge space symbolized by ontologies. The representation of meta data for educational resources greatly improves it machine usability. These progressive steps taken in the field of knowledge and metadata representation now provide a great platform for researchers and theorists to create resourceful and innovative applications which effectively utilize the background knowledge in a particular domain to intelligently and automatically design, compose, evaluate, reengineer and share information rich resources like courseware, web resources, educational materials etc.
There have been numerous attempts to quantify the complexity of problems [14, 15, 16, 17, 21]. The approaches to problem difficulty assessment can be distinguished into two types, knowledge based approaches and cognition based approaches. Researchers which follow knowledge based approach generally present mathematical models for calculating difficulty of a problem based on the knowledge it tests. The cognitive researchers look at the problem from learning point of view and try to find answers from the student and education perspective.
Li and Sambasivam  experiment with static knowledge structure of computer architecture course to compute problem difficult. The difficulty is calculated based on normalized weights of the concepts connected to and from the question. Kou, et.al. [14, 15] propose a very innovative technique to represent concept maps using information objects. These objects act as input to a system which calculates difficulty. Difficulty is considered a function of numerous factors like, number of attributes, learning sequence of concepts, concept depth, number of unknown parameters, and number of given attributes mathematical complexity etc. However the system does not calculate difficulty for complex problems, i.e. problems based on more that one concept. Palazzo, et. al.  provide a great representation for course knowledge. Though they do not consider the problem of difficulty assessment, they provide an excellent means for course ware authoring based on course ontologies linked with prerequisite relationships. The main problem with these approaches is that no solid course representation technique is used consistently. The representations used are often rigid, incomplete and incomputable. Li and Sambasivam’s static knowledge structures are intuitively generated structures where weights are allocated on parent child relationship without any external considerations. Kou et.al. use a number of other factors the values for which are calculated mostly empirically and are highly subjective.
The other group is the one of cognitive and educational researchers. Lee, F-K and Heyworth R., attempt to calculate the difficulty of a problem based on factors like, perceived number of difficult steps, steps required to finish the problem, number of operations in the problems expression and students degree of familiarity with the question. Studies by Croteau, Heffernan & Koedinger (2004), Koedinger & Nathan, (1999), [24, 25, 26, 27] try to figure “why algebra word problems are difficult?” They propose difficulty measures which are based on arithmetic and symbolization in a problem. The reasoning behind this is that, greater the number of symbols in an arithmetic problem, greater is the difficulty. Cognitive research also reason that much of the difficulty children experience with word problems can be attributed to difficulty in comprehending abstract or ambiguous language .
This thesis follows a purely knowledge based approach to assessment of problem difficulty. The main problem with previous works is that they fail to give a coherent representation of the knowledge domain. We present a novel approach to course ontology representation which is standard and coherent, and propose some assessment parameters for problem complexity computation.
In this thesis we present a novel approach to course knowledge representation using course ontologies, in an expressible and computable format using has-prerequisite relationships where concepts involved in teaching a course are arranged in hierarchical order of their importance. It differs from traditional ontologies most significantly in that, it is not IS-A relation based and it is not a directed acyclic graph (DAG) as most traditional ontologies. A schema language is developed called, Course Ontology Description Language (CODL) for representing course ontologies which can provide a framework for encoding and sharing courseware. It is based on OWL and provides a powerful framework for representing course ontologies in a standard and sharable way. Another original approach for specifically pointing out areas in ontologies of maximum relevance is given. This approach allows for the effective processing of only the relevant part of the ontology by which the computation time and resources are effectively saved.
This thesis investigates the properties of test problems by following a purely knowledge based approach for assessment using course ontologies. Here assessment refers to evaluation of test problems for their knowledge content and complexity. We isolate the main pedagogical challenge as finding measurable quantities that can provide guidance in the process of automatic evaluation. We reason that the qualitative assessment of problems in their concept space is a very important step in making online testing, e-learning or web based pedagogy even remotely effective. Standardizing problems by evaluating the complexity can be a backend system with immense potential for test-ware composition and sharing applications. It has the potential to make the already available test-ware resources on the web reusable. These evaluation parameters are calculated by applying mathematical formulations to the course ontology. The parameter performance is also tested in real world test scenarios and it is shown that they are very good indicators of problem complexity. Interesting logical inferences are made from the observed behavior of evaluation parameters with respect to the knowledge a problem tests and the observed performance of the students. The semantic evaluation system can intuitively be applied in varied application areas like automatic test and question generation and solution grading. We present a few possible applications of this system.
Automated design and evaluation involve formal mathematical assessment models unlike cognition based models in humans. These cognition based assessment models for design and evaluation are developed by the human mind over a period of time, by learning and collecting and assessing information from corpora of incoming knowledge. Recently great deal of research is being done to make these corpora of knowledge available for machines. A machine understandable and computable assessment system therefore is essential. This body of knowledge is represented using techniques from knowledge representation like semantic networks and ontologies. Ontology is a method for representing elements in a domain or corpus of knowledge in a hierarchical fashion and links these elements with semantic relationships.
The corpus of course knowledge is hypothetically divided into two tiered description framework namely, concept space and resource space. The course ontology is the conceptual representation of the concept space graph, where in concepts are linked to each other using semantic relations. The resource space gives the description of actual resources for the corresponding concepts from the concept space. The course concept
space, symbolized by course ontology, is built using a language variant of Web Ontology Language, OWL . The language is designed to harness maximum computability at the cost of reduced expressive power. The types of relations and properties are kept at minimum. The second tier of description, the resource space, requires more expressiveness. LOM  developed for learning object classification is used to provide the base elementary description for the learning objects, the resources. In this section we discuss in detail the definition, specification, and constructs for the language used to represent course ontologies.
In computer science and artificial intelligence, knowledge representation is a technique by which knowledge about a particular domain is structured to increase its usability. Knowledge representation techniques are used in AI, cognitive science, and other fields for problem solving, logical reasoning, data mining, question-answering, theorem proving, neural networks, expert systems etc. Davis, Shrobe and Szolovits define knowledge representation as a “set of ontological commitments” and “a medium of pragmatically efficient computation” . It means that knowledge representation is set of vocabulary agreed upon, to represent knowledge which is practical and computable at the same time. It is important for the knowledge representation to be expressible and computable. This in turn brings us to the problem of granularity of information in course ontology. The granularity of the ontology is an important factor to consider while building the course ontology. The ontology can range from being fine-grained to coarse-grained. A finer grained ontology will contain more concepts in detail and more implicit relationships between unrepresented concepts can be discovered. Finer the ontology, the application will have more knowledge to work with giving better results. But defining a finely grained expressive ontology is costly in terms of computation. As more and more concepts and relationships are defined and represented, more is the information to be processed. At the same time, though coarse grained ontologies are computable, they do not have enough information needed for better results. The depth of the knowledge to be represented is therefore an important question in representing any kind of knowledge. Most available finished materials today are coarse granular. Unfortunately, this is not suitable for semantic evaluation. Any design system requires the basic ability to transcend between multiple levels of granularity. In other words the mechanism of decomposing as well as re-composition is fundamental.
Ontologies derive their roots from philosophy. In philosophy ontologies are used to represent the account of what exists. In computer science they are generally defined as “a specification of a conceptualization” . Ontology is a data model that represents a domain and is used to reason about the objects in the domain and the relations between them. In the context of this research the domain is that of a “course”, the objects are “concepts in the course” and the relations between the concepts are that of “has-prerequisite”. Simply put, ontology is a group of concepts organized to reflect the relationships between the concepts. It is a method of specification and speculation about information. In recent past ontologies are increasingly being used to represent information in various domains like biological sciences, accounting and banking, intelligence and military information, geographical systems, language based corpus, cognitive sciences, common sense systems etc. The applicability of computer science is in the efficient representation of these ontologies and the subsequent algorithmic processing. Most ontologies today are so extensive in the breadth of knowledge that processing of these ontologies becomes almost impossible and a gargantuan computation task. There needs to be a way to efficiently process the relevant information in these ontologies to give optimum results in minimum time and complexity of computation. We present a method which points out to a portion of the ontology which is of maximum relevance and then start processing on this portion. The size of this portion of the ontology, which we call as the projection graph, can be changed according to the desired semantic significance.
Ontologies are made up of individuals, classes, attributes and relationships. Individuals, the instantiations of classes, form the basic elements of ontology. Classes are abstract concepts which define and may contain other classes or individuals or both. Attributes are the properties of individuals or relationships. The name of the property is the attribute under consideration while the values of attributes can take form in various data types ranging from integers, strings, boolean etc. An individual is also allowed to have multiple attributes in the definition. Relationships are the way the concepts in the ontology are structured with respect to each other. Relations can be thought of as attributes whose value is another object in the ontology and is used to define the relationship between two or more different objects. Semantic relations particularly important in the context of ontologies are: Meronymy (part-of), Holonymy, Hypernymy, and Hyponymy. The has-prerequisite relationship is like holonymy relationship, where in the child node is a part of the parent node. However, in the context of course ontology, the part-of semantics refers to the prerequisite understanding of the child node needed to understand the parent node. On the whole the course ontology is constructed in such a hierarchical fashion that the children of node represent the knowledge required to understand the parent node, and their children represent the knowledge required to
understand them, so on and so forth. The ontology is created using the principle of “constructivism” borrowed from learning theory. The theory states that any new learning occurs in the context of and on the basis of already acquired knowledge. We use this theory to practically implement the has-prerequisite relationship based course ontology. See in Figure 2.2. “Process Management” is the prerequisite of “OS”.
A node is characterized by two values namely, self-weight and pre-requisite weight. The self-weight of a concept node is the value or the knowledge which is inherent to that node itself. It means that, the self-weight is the numerical realization of the knowledge required to grasp the concept, not in its entirety, but in partiality with respect to itself. To understand the concept entirely, knowledge of the prerequisite concepts is also required, which is given by the prerequisite weight of the node. It gives the numerical realization of the importance of the understanding of the prerequisite concepts in the absolute understanding of a parent concept. Another value which characterizes the course ontology representation is the link weight. The link weight again is the numerical realization of the semantic importance of child concept to the parent concept. Child concepts imperative in the understanding of parent concepts will have a greater link weights than the others. Thus the course ontology representation is a collection of concepts nodes with self-weights and prerequisite weights and has-prerequisite relationships linking these nodes with a value attribute given by the link weight.
The recent advances in Semantic Web representation languages such as RDF, RDF schema , and recently OWL [29, 30] now provide a promising technology basis for metadata representation. The course ontology is represented using OWL. OWL offers a convenient platform for the representation of hierarchical concepts like that in the course ontology. There are 3 sub categories of the OWL language namely, OWL Lite, OWL DL and OWL Full. Among these profiles OWL Full offers maximum expressiveness but it does not guarantee computability. OWL Lite offers computability by restricting expression power of the language. OWL DL (OWL Description Logic) offers a balance between the expressiveness of OWL Full and the computability of OWL Lite. It has all the language constructs from OWL Full, but can only be used with restrictions. The differences between the three categories are explained below.
It is the language we define to represent course ontologies. The schema for the course ontology description is mostly adherent to OWL Lite, with a few extensions. OWL Lite is used because it supports basic classification hierarchy and simple constraint features and due to its computational advantages over the other sub languages. However, representing the schema for our course ontology in OWL is an extremely non trivial issue as we will see in the explanation for the schema. The CODL schema is shown in the Figure 2.3. The elements of CODL defined course ontology are header information, class definitions, property definitions and individuals.
It is a collection of assertions about the course ontology. This section can contain comments, version information and imports for inclusion of other ontologies. For example, the course ontology for a specific course, say “Operating Systems”, can include another separate ontology for a course on “Calculus” from Mathematics. The CODL schema provides a method for conformant exchange of course ontologies. Ontological information about individuals appearing in multiple documents can be linked in a principled way.
<rdfs:comment>A schema for CODL (Course Ontology Description Language)</rdfs:comment>
The owl:priorVersion element can be used to reference the previous version of the ontology. Versioning can effectively be done to different levels of granularity of the ontology. The owl:import element, which takes rdf:resource element as its subject, is used to import another ontology in to one ontology.
The course ontology is structured in the form of individual concepts arranged in a hierarchy. All the individuals in the OWL representation are the instantiations of the class Concept. The class Concept is the super class which defines all concepts in the course ontology, including the restrictions on the values of the properties they can take. The class Concept has the rdfs construct of rdfs:subClassOf. The sub class axiom is used to define the necessary conditions for belonging to a sub class or a property restriction. OWL Lite requires the subject of the rdfs:subClassOf statement to be a class identifier. The instances of the class Concept are also instance of the universal class owl:Class in OWL, which comprises of all the classes which can be legally defined in the vocabulary of OWL language. The object of the sub class axiom is a property restriction. It describes an anonymous class, namely a class of all individuals which satisfy the restriction.
<rdf:comment>Course ontology concept</rdfs:comment>
For example, in the above property restriction, the statement owl:Restriction defines an anonymous class, all of whose instances satisfy the restriction on properties hasPrerequisiteWeight. The property restriction states that, for all instances of class Concept, if they have a prerequisite (hasPrerequisite) then it must belong to extension of Relation. The extension of Concept means the set of all the members of the class Concept.
The class Relation is used to give values to the hasLinkWeight property. In our representation, two instances of class Concept are connected by the property hasPrerequisite which has a link weight value. Accordingly, we want to be able to link an individual to another individual with a value and semantic relationship. These kinds of relationships are called as n-ary relationships . There are two types of properties in the OWL world, object properties which connect instances of classes to each other, and data type properties which connect instance to data values. OWL does not offer means to link individuals with data values. Therefore we make a very important abstraction in the schema to form a separate class for Relations. The main objective of the class Relation is to link two individuals of the class Concept with a data value. We first link instance of the class Concept to an instance of Relation, and then link that instance again to instance of Concept.
Thus by defining relationships between individuals as another class, n-ary relations can be defined in the schema and property restrictions can be applied.
4. ObjectProperty: hasPrerequisite
In our representation, hasPrerequisite property links an instance of Concept and instance of Relation. The semantic relationship of hasPrerequisite between two individuals is defined as an ObjectProperty.
The rdfs:domain is a property feature which is used to limit the domain of the individuals in which the property applies. If a property relates an individual to another individual, and the property has a class as one of the domains, then the individual must belong to that class. Here the property is applicable in the domain of class Concept. It is possible to have more that one domain. The rdfs:range feature limits the individual the property may have as its value. This means that if a property has range as a class, the instance of only that class can have the property. In other words, if a property relates one individual to another, and the property has class as its range, then the other individual must belong to range class. When an instance of the concept class has the property of hasPrerequisite, the other individual to whom it relates to, must be from class Relation. Domain and range both are global restrictions.
This property is used to link instance of Relation to instance of Concept. The property restriction on connectTo, implies that all members of class Relation which connectsTo another member, the other member must be an individual of class Concept.
6. DatatypeProperty: hasLinkWeight
A data type property links individual to data values. Link weight is a characteristic of a relation therefore hasLinkWeight applies to instances of class Relation. The range of the property is set by the resource xsd:float. For the purpose of computational ease we set the values for all the concept and link properties between 0 and 1. In OWL Lite the range of a property must be a class identifier. ObjectProperty and DatatypeProperty are not disjoint in OWL Full unlike in OWL Lite and DL and are both sub classes of the rdf:Property class. The hasLinkWeight property links an instance of class Relation to a data value.
7. DatatypeProperty: hasSelfWeight
hasSelfWeight is used to define the self weight of a node. It too is applicable in the domain of the class Concept and range of values can be between 0 and 1.
8. DatatypeProperty: hasPrerequisiteWeight
This property is used to relate an individual of the class Concept to its prerequisite weight values. By definition, the summation of self weight and prerequisite weight for a node is 1. Therefore this property is actually redundant as the prerequisite weight values doesn’t need to be explicitly specified and can be calculated from the self weight values. However this property is included in the definition language, to incorporate the structural changes needed in an ever growing ontology. Nodes can be added and subtracted from the ontology, which may affect the prerequisite weight. Therefore this property is included to explicitly specify the values in such cases.
This is an instance of a typical individual in course ontology. Here the concept instance “MemoryManagement” is a prerequisite for “OS”. Individuals are generally described by facts about their class membership and their property values. Individual member “OS” is a member of class Concept and has the property values for hasLinkWeight as 0.2, hasSelfWeight as 0.39 and hasPrerequisiteWeight as 0.61.
The most important part of the course ontology structure is the semantics between parent and child concepts. The representation should be able to not only define prerequisite relationship between them, but also define the value strength of this relationship. OWL does not have provision to relate two individuals using data values. In CODL, we define these kinds of n-ary relationships by defining a separate class of relations. Therefore the tool which uses CODL defined course ontology should be able to infer that, since connectsTo links relation_1 and MemoryManagement and hasPrerequisite links OS to relation_1, MemoryManagement is prerequisite of OS.
Characteristics of hasPrerequisite and connecsTo properties are as follows:
hasPrerequisite(a,r), hasPrerequisite(r,c) iff hasPrerequisite(a,c)
connectsTo(a,b), connectsTo(b,c) iff connectsTo(a,c)
Both hasPrerequisite and connectsTo are transitive.
hasPrerequisite (a, b) ≠ hasPrerequisite (b, a)
connectsTo (a, b) ≠ connectsTo (b, a)
Both hasPrerequisite and connectsTo are not symmetric.
3. Functional Property:
hasPrerequisite (a, b) and hasPrerequisite(a, c) does not imply b=c
connectsTo(a, b) and connectsTo(a, c) does not imply b=c.
Both hasPrerequisite and connectsTo are not functional.
4. Inverse of: The inverse properties of hasPrerequisite and connectsTo are isPrerequisiteTo and connectsFrom repectively.
hasPrerequisite(a, b) iff isPrerequisiteTo(b, a) and;
connectsTo(a, b) iff connectsFrom(b, a)
5. Inverse Functional:
hasPrerequisite (b, a) and hasPrerequisite(c, a) does not imply b=c
connectsTo(b, a) and connectsTo(c, a) does not imply b=c.
Both hasPrerequisite and connectsTo are not inverse functional.
In this section we define some more properties to make extensions which can be incorporated in to the CODL schema for making some powerful inferences from the language.
The namespace declarations in OWL ontology provide a means to reference names defined in other OWL ontologies. The owl:import element can be used to import the entire set of assertions made by the imported ontology into the current ontology. However no current definition of import allows us to specify a node as an entirely different ontology. The rootEquivalentTo property provides a mechanism to expand a node in course ontology to a completely new ontology. That means that, a node in course ontology is allowed to be a root node of any other ontology.
This means that “OS”, which is an instance of the class Concept, and is rootEquivalentTo the individual “OperatingSystem”, which is a member of the class Concept specified by the range. The equivalence property for individuals’ owl:sameAs can be used to the same effect. However, defining the property as restriction on relations rather than concepts, allows for more freedom of expression in the schema.
This property provides a mechanism to equate all the nodes within ontology, so that ultimately the whole ontology is one node. It is important to note that relating all nodes by equivalentTo property doesn’t actually mean that they are semantically equal. The purpose of equivalentTo property is only to unify the whole ontology as just an instance of the class Concept. This has very powerful implications for importing and sharing ontologies with different schemas. More power can be attributed to the representation by interspersing different kinds of relationships within ontology. Thus the ontology need not be based solely on hasPrerequisite relationship, but can also have other relationships like those stated above.
The course ontology is mathematically defined in the form of a concept space graph (CSG). A CSG is a view of the concepts space distribution in the domain of a particular course.
A concept space graph T(C, L) is a projection of a semantic net with vertices C and links L where each vertex represents a concept and each link with weight l (i, j) represents the semantics that concept cj is a prerequisite for learning ci, where (ci, cj) Є C and the relative importance of learning cj for learning ci is given by the weight. Each vertex in T is further labeled with self-weight value cumulative prerequisite set weight .
The self-weight represents
the relative semantic importance of the root topic itself with respect to all
other prerequisites. The prerequisite weight represents
the cumulative, relative semantic importance of the prerequisite topics to the
root node. Link weight is the strength of the prerequisite relationship between
the parent and the child. A
It is the propagated prerequisite effect of a subject node along a particular path to a root node. The notion of node path weight is introduced to compute the effect a prerequisite node has on a root node through a specific path. A single node, therefore, can have different prerequisite effect on a root through different paths.
When two concepts x0 and xt are connected through a path “p” consisting of nodes given by the set then the node path weight between these two nodes is given by:
The node path weight for a node to itself is its self weight.
In the Figure 2.6 concept L is connected to concept B through E and F. Therefore the prerequisite effect it has on B is dependent on the prerequisite effect both E and F have on B respectively. Node path weight calculates the prerequisite effect a node has on another node. Therefore the factors of self weight of subject node and prerequisite weights of all the nodes in the path are included in the formula.
From the node path weight calculations we can see that L has a stronger prerequisite effect on B through F rather than E. This is because, L is more important to F (0.5) than E (0.15), prerequisite importance of L is more to F (0.8) than E (0.6) and subsequently F (0.55) is more important to B than E (0.4). Thus node path weights takes into consideration not only the singular effect a node has on its immediate parent but also the combined prerequisite effect a node would have to a root, B in this case, along a certain path.
Incident path weight is same as node path weight except that it does not include the factor of self weight of the subject node. By doing this, we can compute the prerequisite effect the node may have on a root node, excluding the factor of knowledge of the subject node. It is defined as, the absolute prerequisite cost required to reach the root node from a subject node.
From Figure 2.6, the incident path weight calculations for paths between B and L are given by,
Most of the educational resources today are not accompanied with metadata which makes it very difficult for machine processing. For educational resources to be machine processable, they have to be presented in the proper context . In the last chapter we described semantic representation standards and formal mechanism to represent the concept space in detail. In this chapter we look at resource space which is made up of educational resources and the mapping between the resource space and the concept space. One aspect of research in educational technology is the development of technology in standards and practices for educational material research, design, reuse, development, and reengineering. Though educational resources can hardly replace the instructor, they can be very helpful in providing the context of the subject matter. Therefore educational resources need to be precise and intelligible.
A problem/question is one type of educational resource. The commonly observed properties of testware are difficulty or simplicity, breadth and depth of knowledge required to answer, relevance of the question to the root topic, the semantic distance between the concepts tested, ability of the question to test varying populations of students, applicability of the topics taught to a problem, etc. While designing a test, an educator always tries to come up with questions which have maximum coverage of desired topics, diversity among the topics, good testing capabilities with respect to student knowledge, relevance to the material taught, overall generality or specificity. There are many other factors too; however the sub division of those factors almost always leads to the above given basic properties of a good question. It is important to understand these properties for better design and reengineering of test problems. In this thesis, we attempt to visualize and understand these properties of test problems by qualitative evaluation.
The mapping between the resource space and the concept space is called as the problem concept mapping. All educational resources, including test problems are based on a few selected concepts from the ontology. When an educator creates test problems, although instinctively, there is a complex cognitive designing process behind the whole task. The educator has a mental map of the concepts taught in the course and depending on this map, the problems are composed. We define a rudimentary version of this mental map in the form of course ontology. The problem points to certain concepts from the ontology on which it is based. This is called as the problem concept mapping. This is a highly cognitive process which takes place in the human brain and needs a lot of research
work to be able to explain it properly and formally. The research problem of mapping a problem to concepts from ontology automatically is an extremely non-trivial problem which involves research in natural language processing, knowledge representation etc. We limit our research to using of the problem-concept mapping in semantic evaluation. A mapping signifies the concepts which are used to form the question, which also are the same concepts required to answer a particular question.
The connection of the concepts to the testware resource entity can be in the form of an “and” relationship or “or” relationship. An “and” relationship is used to define concepts which are imperative to answer the particular problem. While the “or” relationship defines an alternative between concepts to answer the problem. Example of problem-concept mapping is shown in Figure 3.1. The dotted lines represent the concepts which the problems maps to. The angle between the dotted lines is used to represent “and” or “or” relations. If there is an “or” relation between two or more mapped concepts then it means that, knowledge about any of the concepts is enough to answer the problem. An “and” relationship between two or more concepts means that, problem cannot be answered without the knowledge of all “anded” the concepts. From the figure 3.1, concept B and F are and’ed while concepts T, K and F are or’ed.
These implicit relationships between concepts can be effectively used in evaluating mathematics related problems. It is observed that math problems generally involve a very well defined ontology and numerous ways can be formulated to solve the problems. These solutions can be conceptualized and mapped using or mapping. Whereas a more prose based problem may require combined knowledge from various concepts to form a single solution. These can be mapped as and concepts.
Educational resources must be accessible and intelligible to varied groups of populations for consumption and reuse. Currently there is no formal method for evaluating the utility of an educational resource. We propose an assessment system which attempts to evaluate an educational resource like test problem for its knowledge content and complexity. The system is a framework based on assessment parameters. These parameters can give guidelines for setting up a standard for test problem assessment. This chapter describes in detail the assessment approach and assessment parameters.
The assessment process is essentially a two step approach. The first main step is the extraction of the relevant concepts from the CSG and is called as “CSG extraction”. As seen in the previous chapter, each and every problem maps to some concepts from the course ontology. The set of mapped concepts act as the input to the assessment system hence the concept set has to be precise and methodically selected. The mapping of the concepts signifies that to answer that particular question, the set of mapped concepts are required. However, the course ontology being a prerequisite relation based ontology, knowledge required for understanding a concept, and consequently answering a question, is represented as its prerequisite child concepts. Thus to comprehend a concept, say A, all child node concepts of A have to be understood first, and to understand all the child node concepts, their child node concepts have to be understood, and so on. Therefore for better understanding of a concept, we have to go as far down the ontology as possible. However, most ontologies are vast and there is virtually no limit to how deep one can go in the ontology. Therefore there needs to be a limit set for controlling the propagation. This limit is set by a variable called as the threshold coefficient and the process of extracting this relevant piece of sub graph, called as the projection graph, is called as CSG extraction. These concepts are further explained in the later sections.
The second step in the assessment process is applying algorithms to the individual projection graphs of each of the mapped concepts to calculate the assessment parameters. In the subsequent sections we define some assessment parameters which can help us in understanding the relationships concepts have with a test problem, the knowledge content required to answer a problem and properties of associations which concepts have with each other and the ontology root. Figure 4.1 shows the assessment process. In the first step the CSG extraction module is given the input i.e. the course ontology, the problem concept mapping and the threshold coefficient value. Using these inputs the CSG extraction process outputs individual concept projection graphs. In the next step the
projection graphs and course ontology act an input to the assessment module which calculates the values of assessment parameters.
The concept space graph gives the layout of the course in the concept space with a view of course organization, involved concepts and the relations between the concepts. Examples of large CSG’s include WordNet (150,000) an English language ontology, CYC (47,000 concepts, 30,000 assertions) a well known common sense knowledge mapping project using ontology, LinKBase (1 million in English, 3 million in other languages) a comprehensive medical/clinical ontology, Gene Ontology (now known as GO, over 19000 concepts) the genome mapping project, ThoughtTreasure (27,000 concepts, 51,000 assertions) another common sense mapping project, and so on. Thus defining a workable area of ontology is of the utmost importance from the perspective of semantic relevance and computability. The pruning is achieved by introducing a variable called as the projection threshold coefficient (λ).
By varying the threshold coefficient the size of the computable projection graph can be varied and thus the semantic significance. Since the projection graph is a sub-graph of the concept space graph, it is necessary to have pre-requisite weights for the leaf nodes too, although most times the pre-requisite weight for the leaf nodes is zero. Flexibility for optional pre-requisite weights for the leaf nodes allows the CSG to be extensible and easily extractable for the projection.
Threshold coefficient is a kind of virtual limit by which the size of the projection can be controlled. Greater the coefficient more is the screening for the nodes to be added to the projection and thus smaller is the graph. Less coefficient value means more concepts will be included in the projection. The threshold coefficient can be thought of as a parameter which can set the depth to which the topic has been taught. If a topic is not taught in its entirety, a greater coefficient is assigned so that the depth of the projection graph will be less. Conversely, if a topic is pretty well covered, the value assigned to the threshold coefficient is low, so that the projection graph for the concept is large, encompassing more prerequisite concepts. By varying the threshold coefficient the exact semantic relevance of the question to the whole graph can be computed, the result of which is the projection graph, on which we operate. Threshold coefficient sets the limit to how far one should go down the ontology.
Given a CSG T(C, L), with local root concept x0, and projection threshold coefficient λ, a projection graph P (x0, λ) is defined as a sub graph of T with root x0 and all nodes xt where there is at least one path from x0 to xt in T such that node path weights satisfies the condition: .
The projection set consisting of nodes for a root concept x0 is represented as, ; where represents the ith element of the projection set of node j.
The projection graph points to that area of the ontology of maximum semantic relevance. Consider an example CSG as in Figure 2.5. We find the projection of the local root concepts B and D given the threshold coefficient of λ=0.001. The projections and calculations are shown in Figure 4.2 (a) and (b) and Tables 1 and 2. All nodes that satisfy the condition of node path weights greater than threshold coefficient are included in the projection. Nodes can have multiple paths to the root (J, L, and O). For node J and L, both the path satisfies the condition, whereas for O only one path satisfies the condition (O-I-D-A). Still, O is considered in the projection of D, because it still wields some prerequisite effect on D through one of the paths. If the condition for the threshold coefficient is satisfied then the node is included in the projection.
The main objective of the assessment parameters is to assess the overall knowledge content and the perceived complexity of a test problem. In this section we describe three such assessment parameters namely, coverage, diversity and conceptual distance.
4.3.1 Coverage (α)
The coverage of a question gives a quantitative effect of the selected projection set on the knowledge required to answer a particular question. Coverage of a concept is a direct indicator to the scope of the question in context of the concept space of the course. Formally, “coverage of a node x0 with respect to the root node r is defined as, the product of the sum of the node path weights of all nodes in the projection set P(x0, λ) for the concept x0 , and the incident path weight γ (r, x0) from the root r”.
If the projection set for concept node x0, P(x0, λ) is given by then the coverage for node x0 about the ontology root r is defined as,
where is the Incident Path Weight.
Total coverage of multiple concepts in a problem given by set [C1, C2 … Cn] is,
In eq.6, it is seen that the main factor contributing to the coverage is the summation of the node path weights of all the nodes in the projection of a concept. From the definition of node path weight, we know that it defines the semantic importance of a node to its designated root. Therefore the summation of the node path weights of all the nodes in the projection set gives the cumulative semantic importance of the node in the projection graph on their respective mapped concept roots. The concepts in the projection graph in turn are the concepts which are required to understand a particular concept, controlled by the threshold coefficient. The summation of the node path weights is the amount of knowledge required to answer or rather understand a particular concept. The reason why the factor of summation of node path weights is propagated to the ontology root using the incident path weight is because the questions are asked about the ontology root even though they do not directly point towards it.
Suppose a question tests concepts B and D, Figure 4.2, calculate the coverage of the question given threshold coefficient λ=0.001. The first step is to calculate the individual projections of the concepts as seen in the projection calculation example. The coverage of a concept is then the summation of the node path weights of all the concepts in its projection, propagated to the ontology root. According to the formula,
4.3.2 Diversity (∆)
Diversity tests the extent of the knowledge domain required to answer particular question. If the projections of some of the mapped concepts overlap with each other, i.e. they have some concepts in common; it means that they are less diverse as both indirectly depend upon some common ground for their complete understanding. Whereas when no two concepts are common it means that, the question has high diversity. Diversity is calculated by measuring the effect of common and uncommon prerequisite concepts from the projections of the mapped concepts. It is dependent on the uncommon concepts rather than the common concepts because the disparate concepts attribute the diversity to a question rather than the common concepts. Prerequisite concepts in the projection sets of two or more of the mapped concepts, i.e. the common concepts, only help in reinforcing the requirement for those concepts, rather than contributing towards the diversity. A question has high diversity value if the concepts it tests are distinct in the context of knowledge space.
Alternatively diversity measure can be thought of as an inverse of similarity measure. There have been numerous attempts to quantify the similarity between two concepts in ontology. Different measures based on information content [36, 40, 42], distance , mutual information, etc. have been studied. Our concept of diversity between two concepts can give some insight into the similarity measures. It can be thought of as an inverse similarity measure. We present a definition of diversity which is not node based, link based or information based, but rather a knowledge based approach which renders it uniqueness.
Diversity is formally defined as “the ratio of summation of node path weights of all nodes in the non-overlapping set to their respective roots, and the sum of the summation of node path weights of all nodes in the overlap set and summation of node path weights of all nodes in the non-overlap set.”
Consider a question asks a set of concepts, . The respective projection sets are given by,
The non-overlapping and overlapping sets are, and , where i and j are the local root parents of any element from N and O respectively and .
Diversity is given by,
Figure 4.3, shows the nodes in the projections of B & D, and the shaded area shows the nodes in the overlapping region. Diversity can be calculated by the means of the formula as,
This means that the diversity between concepts B and D is 97%. The concepts have high diversity.