Semantic gap

The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols. According to Hein, the semantic gap can be defined as "the difference in meaning between constructs formed within different representation systems".[1] In computer science, the concept is relevant whenever ordinary human activities, observations, and tasks are transferred into a computational representation.[2][3][1]

More precisely the gap means the difference between ambiguous formulation of contextual knowledge in a powerful language (e.g. natural language) and its sound, reproducible and computational representation in a formal language (e.g. programming language). Semantics of an object depends on the context it is regarded within. For practical application this means any formal representation of real world tasks requires the translation of the contextual expert knowledge of an application (high-level) into the elementary and reproducible operations of a computing machine (low-level). Since natural language allows the expression of tasks which are impossible to compute in a formal language there are no means to automatize this translation in a general way. Moreover, the examination of languages within the Chomsky hierarchy indicates that there is no formal and consequently automated way of translating from one language into another above a certain level of expressional power.

Theoretical background

The yet unproven but commonly accepted Church-Turing thesis states that a Turing machine and all equivalent formal languages such as the lambda calculus perform and represent all formal operations respectively as applied by a computing human. However the selection of adequate operations for the correct computation itself is not formally deducible, moreover it depends on the computability of the underlying problem. Tasks, such as the halting problem, may be formulated comprehensively in natural language, but the computational representation will not terminate or does not provide a usable result, which is proven by Rice's theorem. The general expression of limitations for rule based deduction by Gödel's incompleteness theorem indicates that the semantic gap is never to be fully closed. These are general statements, considering the generalized limits of computation on the highest level of abstraction where the semantic gap manifests itself. There are however lots of subsets of problems which may be translated automatically, especially in the higher numbered levels of the Chomsky hierarchy.

Other Languages