How are operators implemented in programming languages

Design and implementation of an ADT-oriented programming language

Transcript

1 Department of Mathematics and Computer Science Design and implementation of an ADT-oriented programming language Authors Michael Burzan Degree: M. Sc. Computer Science and Michael Kußmann Degree: Computer Science (Diploma) Reviewer: Prof. Dr. Achim Clausing, Prof. Dr. Markus Müller-Olm Münster, February 2014

2

3 Table of contents Preface VII 1. Introduction Motivation and objectives Chapter overview Basics Programming paradigms Types and type systems Abstract data types Programming languages ​​in teaching Presentation of various programming languages ​​The programming language C The programming language Java The programming language Pascal The programming language Lisp TeaJay Basic design Hello World Types built into TeaJay TeaJays Language elements Lexicographic structure Keywords Operators Separators Numbers Strings Letter-Literal Truth Values ​​The Null-literal Identifier Comments Packages Types and Implementations Type Definitions Enumeration Types Generic Types Parameterized Types III

4 Table of contents Inheritance Assignment compatibility of types Type deletion Implementations Interfaces Closures Operators Non-overloadable operators Overloadable operators Operator precedence Considerations of operators and operator overloading Blocks and statements Block expression statements Local variable declarations assert-statement Conditional statements Loop statements break- and continue-statement throw-statement try-statement return-statement Unattainable statements Expressions Compatibility with Java Excursus: Using TeaJay TeaJay in teaching First steps in TeaJay The first TeaJay program FancyAdder: A further development of the SimpleAdder The ADT stack Using self-defined ADTs in TeaJay Designing types with TeaJay The type Complex Der Type Map The type NumberStream TeaJays functional elements Closures as method pointers The type MapReducer The type Selector TeaJays Type system List vs. Array TeaJay in practice ADTs in TeaJay Development of a simple input / output sgabe library for TeaJay 89 IV

5 Table of contents Writing implementations for TeaJay's standard library GUI programs with TeaJay Retrofitting numerical calculations with Number counting loop Implementation ByteCode framework Structure of a class file Overview of the ByteCode framework ClassFile pool of constants Attributes The signature attribute Fields and methods Annotations The code attribute The instruction list The stack map table Special instruction classes Dynamic address binding HalloWelt program in bytecode assembler TeaJays runtime environment TeaJays language library Binding implementations to types Searching for implementations invokedynamic Inheriting or overwriting methods Writing TeaJay types in Java teajay.runtime.tjtypes MiniCompiler The compilation process The Lexer The parser Abstract syntax tree Traversing the syntax tree Description of TeaJayUtils Description of TypeUtils Management of names Local variable table Calculation of stack map frames The derivation tree Methods search compilation of type definitions compilation of type implementation compilation of closures compilation of enumerated types V

6 Contents Compilation of Expressions Primary Expressions Arithmetic Expressions Assignments Prefix Expressions Method and Constructor Calls Compilation of Statements typeswitch-statement while- and do-statement break-, continue-, throw-statement Digression: Extending TeaJay Outlook Conclusion 177 A. The TeaJay grammar 179 B TeaJays Standard Library 191 B.1. Implicitly imported types B.2. TeaJays Language Library B.2.1. List B.2.2. Number B.2.3. String B.2.4. CharSequence B.2.5. Boolean B.2.6. TJObject B.2.7. System B.2.8. TJThrowable and TJException B.2.9. RandomAccess B TJComparable B TJIterable B TJIterator B Collection B IO B.3. teajay.util B.4. java.io.file C. Data carrier 199 Source code list 201 List of figures 203 List of tables 205 Bibliography 207 VI

7 Foreword The present work was conceived by us Michael Burzan and Michael Kußmann as a joint project. This means that both authors have an equal share in the design of the language and in its implementation. Due to the joint work, a clear division of the work is not meaningful. Therefore, a coherent text was written in which each author writes individual sections and is responsible for them. This is done by naming the initials of the respective author (MB or MK) at the beginning of the text. The initials are only used when the authorship changes. In addition, the page numbers are marked with the respective initials. However, on pages on which the authorship changes, only the initials are decisive. For some sections, a division of responsibility is not possible or useful. These are identified by naming the initials of both authors and have no initials on their page number. Due to the requirements of the diploma and master’s examination regulations, which provide for a separate submission of the final theses, the common text for submission is separated according to the authorship. Sections for which both authors are responsible appear in both papers, marked accordingly. The authorship of the program code was indicated in the source text by naming the respective author. The individual classes were assigned to the author who had the greatest share in their implementation. Both authors are named in individual cases. [SR03, p. 2f] served as a model for this preface. VII

8

9 1. Introduction In this work the development of the abstract data type (ADT) oriented programming language called TeaJay is documented and the reader is given basic ideas on how TeaJay can be used in teaching and practice. For this purpose, a prototype compiler for TeaJay was developed in the course of this work. MB MK 1.1. Motivation and goal setting Programming languages ​​are languages ​​and as such are media of thought [Cla11, p. 2]. Therefore, the design of a programming language should always aim to support the programmer in solving problems by the language helping him to structure the problem. 1 This is especially true for teaching languages, since novice programmers usually do not have much experience in structuring problems formally. Therefore, a teaching language should have powerful abstraction mechanisms that are oriented towards the real world at the same time. One approach to combine these two requirements is object-oriented programming, which enables the programmer to structure problems by mapping them onto a class hierarchy. Each class also defines a new type. Many programming languages ​​support the creation of user-defined types, but most of the time they allow the definition of a type to be mixed with its implementation or even enforce it. At the latest when translating, the type and implementation are firmly linked. There are libraries in many languages ​​that allow types to be loaded dynamically, but these are often awkward to use and cannot benefit from a static type test that may be present. In addition, many languages, such as Java or C ++, the secret principle does not consistently and allow e.g. direct access to internal states of types, which makes it even more difficult to replace an implementation. TeaJay's concept of type should therefore be based strongly on the concept of the abstract data type (ADT) and only represent the description of an interface with types and operations. Communication with the type should only take place via this interface and it should be possible to change the implementation of the type without any problems even after the compilation time. The link between type and implementation should therefore be as dynamic as possible. Also built-in and user-defined types in TeaJay's type system should be given equal rights as far as possible. 1 In the case of programming languages, this means developing a model that can be translated onto the computer. 1

10 Chapter 1: Introduction Anyone who knows an object-oriented language knows that to create an instance of a class (or a type) the specific name of the class must be known. TeaJay will also change this in the course of separating type and implementation: In order to be able to create a type, only the name of the type should be known. An implementation does not have to be available until the program is executed and can then be searched dynamically or announced by other means. This should lead to the fact that it seems natural in TeaJay to write modular programs. The aim of this work is to design and implement an object-oriented programming language that puts the concept of type (in the sense of an ADT) in the foreground and thereby encourages thinking about the interface design. This is intended to help a developer structure problems by making the language seem natural to break programs down into weakly coupled modules. A developer inevitably has to think about the limits of the individual modules (interfaces). TeaJay is primarily intended to be a teaching language, but not so far removed from pragmatics that it can only become an island language of teaching. It should also be possible to teach different programming paradigms in TeaJay. TeaJay should be strongly and statically typed and the type system should have as few irregularities as possible. This aspect is primarily intended to benefit teaching. However, a powerful type system also supports experienced developers in avoiding errors. Furthermore, TeaJay should support automatic garbage collection as an integral part of the language Chapter overview Chapter 1 contains an introduction and this chapter overview. Chapter 2 discusses some basics about programming languages ​​and abstract data types. In addition, some criteria are set out in this chapter which, in the opinion of the authors, a teaching language should meet. Chapter 3 describes the TeaJay language and discusses some design decisions with a focus on teaching. Chapter 4 is intended to give the reader a deeper insight into TeaJay and shows, mainly using examples, how TeaJay can be used. The teaching aspect of TeaJay is deepened in this chapter. Chapter 5 describes the implementation of a compiler and a runtime environment for TeaJay. Chapter 6 shows possibilities for a further development of TeaJay. Chapter 7 draws a conclusion. 2

11 2. Basics Today there is a large number of different programming languages. With a few exceptions, most of them were not widely used. However, with the diverse creation of programming languages, knowledge of general design concepts has increased. This chapter briefly introduces the most important concepts and properties that well-known programming languages ​​have. Since programming languages ​​have to be learned before they can be used, this chapter lists some of the teaching properties that a teaching language should have. Furthermore, the term abstract data type is explained and four different programming languages ​​are presented in summary. MB MK 2.1. Programming paradigms Programming languages ​​can be classified according to which programming paradigms they support. A programming paradigm describes the way in which a problem can be solved in a programming language and thus also specifies parts of it. In the literature, four different main paradigms are usually distinguished: These are the imperative, the object-oriented, the functional and the logical programming paradigm. Nowadays it is quite common for a programming language to fulfill several paradigms, for example languages ​​exist that are object-oriented and imperative and also have functional elements. MB The imperative paradigm: Programs are described as a sequence of instructions that are processed sequentially. The sequential execution of instructions can be changed in imperative languages ​​by conditional and unconditional jumps. Closely connected to the imperative paradigm are program variables that can experience state changes during the execution of a program through instructions. This is basically the core of the imperative paradigm: programs change states. In addition to jumps as control structures, modern imperative languages ​​also offer loops and procedures or functions to control the program flow. The functional paradigm: Programs are understood as functions that map inputs into outputs. The functional paradigm allows very compact programs to be implemented. A function itself is implemented by calling other functions. This is how the functional paradigm coined the term recursive programming. Additional functions can be used by the user 3 MB

12 Chapter 2: Basics can be generated functions or operators that are applied to the input of the function. In most functional programming languages, operators are also understood as functions. The functional paradigm does not recognize variables because changes of state are not required. The object-oriented paradigm: Most object-oriented programming languages ​​are state-based and thus related to [the] imperative [paradigm] [BH98]. While in imperative languages ​​variables and instructions that change the state of the variable are viewed as separate, in object-oriented languages ​​these are merged with one another under the term object. An object is seen as a structure that manages states and has methods for changing them. The manipulation of an object is understood as the exchange of messages between objects. A class is understood as an abstraction of an object. The class defines the necessary attributes and interfaces that an instance (object) of the class has. In object-oriented languages ​​it is also possible to arrange classes in hierarchies and thus to model relationships. This enables a high degree of problem abstraction to be carried out. The logical paradigm: In languages ​​that follow the logical paradigm, programs are written that represent questions to the computer (cf. [Cla11]). This gives an answer that he finds independently with the help of the predicate logic [ibid.]. Other paradigms that deserve a brief introduction are listed below. With a few exceptions, these paradigms are often found as supplementary paradigms in programming languages. Paradigm Parallel Programming Constraint Programming Data Flow Programming Distributed Programming Description The language has properties that make it easier to multithread. Here the language offers e.g. Monitors on to allow thread synchronization. The language enables logical programming to be expanded to include expressions such as mathematical equations. In data flow programming, programs are defined as a network of black box processes that manipulate and exchange their data through predefined communication [Mor]. The language has properties that make it easier to communicate with and exchange data with several physically separate systems. 4 MB

13 2.2. Types and type systems Generic programming Metaprogramming Visual programming Developers have the option of implementing algorithms and data structures in the language independently of specific types, so that they do not have to be known at the time of implementation. The language enables [programs to collect information about themselves] and to modify themselves according to new requirements [MK06]. These languages ​​allow programs to be expressed graphically. This can e.g. look like program flowcharts that are known from the UML. Table 2.1 .: further programming paradigms 2.2. Types and type systems Types and type systems form the foundation of every modern programming language. However, there are also languages ​​that have neither types nor a type system, e.g. Assembly languages. Types specify which operations may and may not be used. The example code in Figure 2.1 shows an incorrect use of operations in a typed and an untyped language. While in the typed language the last expression as meaningless integer j = 0 string c = "HELLO" j = j - 1 j = c - jj = 0 c = "HELLO" j = j - 1 j = c - j Figure 2.1. : left: typed version; right: untyped version or recognized as incorrect, it is evaluated in the untyped language. The question arises as to what value j has after the expression has been evaluated. The most likely answer is that the bit pattern is calculated from c minus j. But it is definitely not what the developer intended. Typed languages ​​are criticized because they supposedly patronize the developers and lead to inefficient programs. Strom invalidates these arguments and points out that modern compilers can generate better and more effective code than most developers and that paternalism is a small price to pay for the time savings that can be achieved through typed languages ​​(cf. [Str96]). High-level languages ​​generally allow complex types to be built from primitive types. Primitive types are those that are built into the language and allow basic operations.The most common primitive types are integers for integer expressions, double for floating point operations, characters for ASCII character manipulation, boolean for truth values ​​and void for empty 5 MB

14 Chapter 2: Printing Basics. The primitive types are the atoms of their language and cannot be broken down further. So that complex types can be built, the programming language must have containers for aggregation. The simplest are: Structures or groups (records): They combine a specific number of types and give them a new type name. Fields: They define a fixed index structure using a single type and allow access to the elements of the field using an index. Lists: They manage any number of types, but do not allow access via an index, but by iterating over the beginning of the list. Another type of primitive type are pointers or references. They serve as references to addresses or to the content of an address in memory. This type of type can also be used to construct complex types. It is e.g. possible to implement a list by concatenating pointers or references. In order for types to be used in programming languages ​​and incorrect operations to be discovered, a type system must be designed and developed. A type system is the set of rules according to which it is decided whether operations on a certain type are permitted or not. This set of rules is applied at runtime or compile time of a program. If the set of rules is applied at compile time, static typing is used and, in the case of runtime, dynamic typing. The difference between the two typifications lies in the time at which the error occurred. In the case of statically typed languages ​​this takes place during compile time and in dynamically typed languages ​​during runtime. Typed languages ​​are also divided into strongly or weakly typed. Weakly typed languages ​​are able to apply non-conforming operations to types by implicitly or explicitly assuming a different type for the expression. In the case of strongly typed languages, such possibilities only exist to a limited extent. This definition should not be considered final. Since most programming languages ​​are also subject to pragmatic design decisions, languages ​​that are considered strongly typed allow exceptions under certain conditions. In the case of object-oriented programming languages ​​in which relationships are taken into account, a strong type system must take this into account, since otherwise it is not possible to replace more general objects with specific ones. Such a type system is called polymorphic. If a type system is polymorphic, a variable usually has three different types, these are (cf. [Pun07, p.18]): Declared type: Insofar as variables are declared in a language, it is the type of the variable that is used during the declaration was specified. Static type: The type is determined statically by the compiler and can be more specific than the declared type. 6 MB

15 2.3. Abstract data types Dynamic type: This is the runtime type of a variable and can change with each assignment. Some programming languages ​​allow generic types to be designed and used. The genericity in a polymorphic type system introduces the possibility of designing types and thus their interfaces independently of specific types. The generic type parameters are used for this. These are substitutes for specific types. The generic type parameters describe the required interface that a specific type must have so that the interface of the generic type can handle it. In the implementation of an interface, only the deputy is used at compile time. In addition, generic types are subjected to special treatment in the type system, because if a generic type is created with a specific type, the interface of the generic type is changed to the specified specific type. This means that no type conversions have to be carried out by the developer. This achieves greater flexibility than the polymorphic type system allows. The above description explains the homogeneous translation of genericity as it exists in Java (cf. [Pun07, p.111f.]). Another form is heterogeneous translation, which is similar to the concept of copy and paste. Since every creation of an instance of a generic type leads to the compiler creating a new type in which every occurrence of the generic parameter is replaced by the type specified when creating the instance (cf. [ibid. P.112]). which have associations and procedures or functions usually allow abstract data types to be defined. ADTs are also called user-defined data types because they make it possible to define types that are similar to the built-in types of the programming language (cf. [Coo]). This means that ADTs are an extension of the type system, so that the same control system is used for them. So that this can be done, an ADT is only specified via its permitted operations. So all implementation details are hidden from the client. This fact is known as information hiding or the principle of secrecy. An advantage of a consistent separation of implementation and interface is the easier exchange of implementations without further changes by the client. An example of an ADT is shown in Figure 2.2. If clients want to use the interface of the ADT IntStack, the developers must be told how it is used correctly. Furthermore, it is not always the case that the designer of the interface is also its implementer. This means that it must be made clear what is meant by correct behavior of the ADT. There are two different concepts for this. One concept is the commenting of the interface and the other the formal specification as shown by Guttag (cf. [Gut77]). When commenting, it should be noted that it describes the behavior of the interface, but misunderstanding MB MK 7

16 Chapter 2: Basics Struct IntStack {List_of_Integer stack; } IntStack create () {...} Void push (intstack s) {...} Integer top (intstack s) {...} Void pop (intstack s) {...} Boolean isempty (intstack s) { ...} Struct IntStack; IntStack create (); Void push (intstack s); Integer top (intstack s); Void pop (intstack s); Boolean isempty (intstack s); Figure 2.2 .: left: implementation; right: interface failures or the complete ignoring of the implementer cannot be ruled out. However, it is conceivable that the designer of the ADT provides a test case. This can test whether the implementation meets the required specification. Thanks to the formal specification, modern compilers can, under certain conditions, give the implementer feedback on whether he or she has met the specification. ADTs can be defined by classes because a class can hide the implementation details and define a public interface. This interface then defines an ADT. However, most object-oriented languages ​​do not enforce the secret principle. So the developer has to do this through disciplined behavior. In most cases, classes are also members of an inheritance hierarchy and can therefore access internal information of the class from which they inherit and change it as required. As a result, most object-oriented languages ​​give way to the principle of secrecy. Programming languages ​​in teaching Programming languages ​​are tools with which problems are to be solved. But they are also a means of communicating the thoughts of developers. Teaching programming languages ​​in school and university is fundamentally different from teaching a natural language. The aim here is not to teach one or more programming languages ​​per se, but to teach skills and knowledge that are suitable for solving a problem in any programming language (cf. [Goe97]). The challenge for the teacher is to decide which skills and knowledge are important and which are not. A single programming language is unlikely to serve as the teaching language for all of the competencies that the teacher considers important. It follows from this that the language of instruction cannot exist. For example, if the learner is to learn recursive programming, a functional language is better suited than a multi-paradigm language. If, on the other hand, the teacher wants to promote the recognition and application of abstraction in a meaningful way, object-oriented languages ​​are probably more suitable than functional ones. From the objectives of the state of North Rhine-Westphalia for computer science lessons in the upper secondary school it follows that the students should learn imperative, functional, logical and object-oriented programming (cf. [Cur99]). This 8

17 2.4. Providing programming languages ​​in teaching is an ambitious goal and therefore the selection of programming languages ​​should be carefully considered by the teacher. For a programming language that is used in teaching, it should be noted that it should facilitate the modeling of phenomena in the real world and that this should not be hindered by a complex notation (cf. [Goe97]). A requirement of a teaching language can be that it is widely used in business and that it has many tools that support development. But if a language is widespread, many standard problems that could serve as learning objects have already been solved. This creates a motivation problem for teachers and learners, which it is difficult to counteract. Development aids are generally good, but care should be taken not to use them as a guideline for teaching (cf. [Cur99]). Furthermore, according to Goedicke and Strom, a programming language is more suitable as a teaching language, the earlier errors are found (cf. [Goe97] [Str96]). Thus, an important requirement of a teaching language is that it is static and strongly typed. Ketz and Hug summarize that a teaching language should have the following properties: lean, clean, secure, statically typed, high-level programming language that is suitable for teaching and learning modular and object-oriented programming [H.K98]. However, no information is given about what the individual points mean exactly. In this thesis, the following properties are assumed that characterize a programming language as a teaching language: Static typing: This is used to increase the focus on the problem and to enable potential problems to be excluded. Programs are implemented in an object-oriented and modular manner: Beginners should learn to understand and apply object-oriented concepts because they have established themselves. The possibility of building programs up modularly in a teaching language primarily serves the purpose of understanding how source code is organized in order to e.g. to carry out a software project with a learning group. imperative programming with functional elements: This enables beginners to learn the two most common ways of formulating programs. Complete preservation of the principle of secrecy: By preserving the principle of secrecy, it should be ensured that beginners think about the interfaces in order to increase the focus on the design of modules and types. Least Surprise Principle: This principle is designed to help beginners avoid errors and help them make correct assumptions about the semantics of the language. 9

18 Chapter 2: Basics Lean set of existing resources: This enables tasks that are set to appear meaningful to the learner. Syntactic uniformity before ambiguity: Beginners should be able to generate their solutions in a simple and unambiguous way. In this way, they are not distracted by various syntactic but equivalent semantic elements in the language. Automatic memory management: As with static type safety, memory management should be relieved of the user so that he can concentrate on solving the problem. Presentation of different programming languages ​​The development of a programming language cannot be viewed in isolation, as earlier programming languages ​​often influenced the newly developed . The developers look at the syntax of programming languages ​​and use this to design their own. There are also universal concepts that have established themselves and are being expanded to include new ones of their own. Most programming languages ​​were not developed to be used specifically in teaching, but to solve problems. Therefore, one of the programming languages ​​presented here cannot be expected to meet the criteria mentioned above. However, a language that fulfills as many properties as possible can be a good starting point for development. MB The programming language C The programming language that had a very strong influence on the design of many programming languages ​​is C. It was primarily designed for system-level development. Furthermore, it gives the developer all the freedom that is possible with its syntax. If e.g. other programming languages ​​report the indexing by a negative constant as an error, the programming language C is in the best case, depending on the compiler manufacturer, worth a warning. Another feature that will cause problems for beginners is memory management. It has to be completely taken over by the developer. Beginners spend a lot of time examining their source code for memory problems. The programming language C offers macros based on pure text replacement without any logic behind it. This leads to effects and errors which the developer did not intend and which are difficult to localize. This happens e.g. when determining the minimum of two numbers using a macro: 1 #define min (a, b) a

19 2.5. Presentation of different programming languages ​​So by the pure text replacement the effect occurs that a or b is increased by two, not by one. The language offers the possibility of combining data types in records or structures. Modular programming is made possible by the macro concept in C. This means that data can initially be hidden from the user. Generic programming is achieved in C with the special data type void *. By using structures and modular programming, it is possible to design and develop ADTs in C. The example in Figure 2.3 shows one possibility. As you can see there, ADTs in C can be 1 #ifndef STACK_H 2 #define STACK_H 3 4 struct Stack; 5 6 // Create a l e r e r stacks 7 struct Stack c r e a t e (); 8 // Add element to the stack 9 void push (struct Stack s, void element, s i z e _ t s i z e); 10 // Top element of the stack e n t f e r n e n 11 // F a l l s Stack l e e r i s t, NULL is returned 12 void pop (struct Stack s); 13 // Check whether the stack can be read 14 int isempty (struct Stack s); #endif 1 #include "stack. h" 2 3 struct Stack {4 int elemcnt; 5 int s i z e; 6 void array; 7}; 8 Source code 2.1: stack.h 9 struct Stack create () {} 10 void push (struct Stack s, void element, size _ tsize) {} 11 void pop (struct Stack s) {} 12 int isempty (struct Stack s) {} void resize (struct Stack s) {} Source code 2.2: stack.c Figure 2.3 .: ADT stack in C: above.h file, below.c file mentare specify. The principle of secrecy is achieved through the division into header and C files. A user can use the header file to create a stack and to communicate with it. The internal structure of the stack is hidden from the user. 11 MB

20 Chapter 2: Basics The C programming language is too complex for beginners because its semantics allow for too many exceptions. There is no automatic memory management and the syntax of C is ambiguous. In addition, C is weakly typed, which means that queries are necessary to guarantee type safety. In Figure 2.3, the push function expects a void * and the size of the element to be added to the stack. This type of generic programming is cumbersome and error-prone, since in this method the memory must be allocated in the correct size and copied in the correct way. Furthermore, object-oriented concepts are not supported in the language. These properties make it difficult to teach and learn C. The Java programming language Java has been widely used in teaching and business over the past 15 years. This is due to the fact that this language is considered powerful and easy to learn. It supports a variety of paradigms, including the imperative and the object-oriented. It also has elements of the functional paradigm. It also supports generic, parallel and distributed programming. The Java programming language has a strong type system and the syntactic scope of the language is kept slim. The aim of this language is to be as platform-independent as possible. This means that the source code is compiled and then runs on all platforms. The Java compiler usually does not generate executable machine code but bytecode. The generated bytecode runs in a virtual machine that is supplied with Java. For the bytecode, the virtual machine can be understood as an abstraction of the hardware. It follows from this property that Java only runs on systems for which the Java Virtual Machine has been implemented.In Java, ADTs can be created using the interface and object-oriented concept. The example in Figure 2.4 shows this possibility. 1 interface IStack {2 // If the stack is l e e r i s t, g i b t r u e back 3 // Otherwise if it is 4 boolean isempty (); 5 // Adds an element to the stack 6 void push (T element); 7 // E n t f e r n t the top element from the stack 8 // F a l l s Stack l e e r throws an exception 9 T pop () throws exception; 10} Source code 2.3: istack.java 1 class Stack implements IStack {2 3 public Stack () {} 4 public void push (T element) {} 5 public T pop () throws Exception {} 6 public boolean isempty () {} 7 8} Source code 2.4: stack.java Figure 2.4 .: ADT stack in Java: left interface, right class 12 MB

21 2.5. Presentation of different programming languages ​​As can be seen in Figure 2.4, ADTs can be specified by comments, as in C. By dividing it into interface and class, the internal structure is hidden from the user. However, it is not possible to specify in an interface how a type is created [Jia]. Java allows learners to focus heavily on solving a problem. This is made possible by the strong type system, the automatic memory management and the good error messages of the compiler. Another strength for teaching is that Java has an active community that develops many tools that make development easier. Thanks to Java's widespread use, a learner can also find help quickly. Despite the advantages mentioned, Java has problems in the type system. Among other things, there is an error in the static typing, which is shown in the following example: 1 Object [] array = new S t r i n g [2]; // This assignment can be l u b t e n i c h t e r 2 array [0] = new I n t e g e r (1); 3 array [1] = new double (1. 5); This source code should not be accepted by the Java compiler, but it does. As a result, the problem only occurs at runtime. By consistently using the generic concept and the supplied libraries, it is possible to circumvent this problem. The user-defined types, however, seem a bit foreign to the Java types, since it is similar to the String type, e.g. is allowed to perform the concatenation with the + operator. All arithmetic operators can be executed with the types integer, float, long and double. Such capabilities do not exist for user-defined types. The Pascal programming language The Pascal programming language was designed by Niklaus Wirth in 1970 as a minimalist imperative teaching language (cf. [Cla11, p. 129]). Pascal is strongly and statically typed and the Pascal type system offers some special features (cf. [Cat13]): MK Pascal allows you to define your own data types by restricting the range of values ​​of built-in types (especially whole numbers): For example, type byte =; the byte type is defined as an unsigned 8-bit integer. You can also specify the value range directly for variables: var i: 0..42 ;. Here i then has an anonymous type whose range of values ​​corresponds to [0, 42] Z. Unfortunately, the widespread Free Pascal Compiler [www13] does not enforce many value range restrictions by default: 1 program fpc; 2 3 var 4 s m a l l:; 5 b i g:; 6 begin 7 s m a l l: =; {warning only} 8 writeln (s m a l l); {Output: 111} 9 b i g: =; 10 small: = big; {no compiler error, no warning} 11 writeln (s m a l l); {Output: 0} 12 end. 13 MK

22 Chapter 2: Basics Source Code 2.5: fpc.pas Pascal has the option of defining your own sets of whole numbers as types: type digits = set of 0..9; var MyDigits: digit; With these definitions, the following source code is now valid: 1 MyDigits: = [2, 3, 8]; 2 i f 4 in MyDigits then 3 writeln (can not p a s s i e r e n); 4 i f 6 in [0 .. 4,] then 5 writeln (k l a p p t); The size of an array is determined by Pascal's type system: 1 var a r r: array [] of integer; 2 begin 3 a r r [0]: = 1; a r r [1]: = 2; 4 writeln (a r r [4]); {A rew a rt} 5 end. There is a built-in type file, which allows the type-safe handling of files. For example, you can declare a file so that its contents are interpreted as a list of integers: 1 type 2 i n t l i s t: array [] of integer; 3 var 4 i n t f i l e: f i l e of i n t l i s t; Pointers are supported by Pascal, but there is no pointer arithmetic such as in C. Pascal also offers enumeration and array types, as well as composite types (record) as a means of aggregation. An example of compound types in Pascal looks like this: 1 program _record; 2 3 type p e r s o n = record 4 name: s t r i n g; 5 age: end; 7 var Andy: person; 8 begin 9 with (Andy) do 10 begin 11 name: = Andy; 12 age: = 2 3; 13 end; 14 writeln (Andy. Name); {Edition: Andy} 15 end. Source code 2.6: record.pas The original Pascal hardly supported the developer in defining his own ADTs. Although it is possible to define new structures to a large extent, there is no separate namespace for its operations. This deficiency was later remedied by units (see [Ola13]). Units form their own namespace and allow the developer to define a public interface. Source code 2.7 shows this using the example of a stack. 14 MK

23 2.5. Presentation of different programming languages ​​1 unit stack; 2 {a st a p e l for storing characters} 3 i n t e r f a c e {o e f f e n t l i c h e S c h n i t t s t e l l e} 4 type 5 tnatzahlnull = 0 ..) 42 e l s e 43 outelem: = start ^. i n f o 44 end; {to p} 45 procedure push (inelem: char); 46 {s t a p e l t e i n n e u e s element} 47 var 48 new: trefstapelelem; 49 begin 50 new (new); 51 new ^. next: = start; 52 new ^. i n f o: = inelem; 53 start: = new 54 end; {push} 55 procedure pop; 56 {e n t f e r n t d a s o b e r s t e S t a p e l e l e m e n t} 57 var 58 h i l f: trefstapelelem; 59 begin 60 i f isempty then 61 writeln (S t a p e l i s t l e e r.) 62 e l s e 63 begin 64 h i l f: = start; 65 beginning: = beginning ^. next; 66 d i s p o s e (h i l f) {No a u t o m a ti s c h e S p e i c h e r v e r w a l t u n g u e r R e f e r e n z e n} 67 end 68 end; {pop} 69 begin {I n i t i a l i s i e r u n g} 70 c r e a t e 71 end. {S t a p e l} Source code 2.7: Implementation of a stack in Pascal. The template for the stack comes from an exercise at the Fernuniversität Hagen. Pascal's powerful type system has its advantages when it comes to writing secure (and verifiable) code. However, it is also very extensive and difficult to learn at first. In addition, Pascal's syntax, which is unusual by today's standards, might put off beginners at first. 15 MK

24 Chapter 2: Basics The Lisp Programming Language Lisp is a list-based and functional language. The name Lisp (List processing) is based on the property that the language for aggregation offers an unchangeable 1 list. Most of the languages ​​mentioned above, on the other hand, offer a rather weak abstraction of the von neumann storage model (array). A special feature of [t.] Lisp (cf. [Cla11, p. 45]) is that functions and even built-in operators are naturally types of language (first-class citizen). Like many interpreted languages, t.lisp also has a purely dynamic typing 2. This means that type errors only become visible at runtime and are then possibly more difficult to identify. But it also simplifies the development of small programs. One advantage of functional languages ​​such as Lisp is that an ADT specification is usually the same as its implementation. On the other hand, Lisp does not offer any namespaces and the actual internal state of an ADT has to be managed by the client himself. This becomes clear in source code 2.8: Each push creates a new list (which represents the stack) and must be passed by the client on the next push. Source code 2.8 is only given for the sake of completeness, since a stack data type is not very useful in Lisp, because each list can be treated directly like a stack with the list operators car and cdr. Due to the lack of namespaces, Lisp is rather unsuitable for teaching ADTs. In addition, functional programming languages ​​have only established themselves in niches in the economy (e.g. in computer algebra systems). However, this is not intended to mean that functional programming should not play a role in teaching. Lisp's strong orientation towards the functional programming paradigm makes it an ideal language to convey the advantages and disadvantages of this type of programming. There are also dialects of Lisp that correct many of the shortcomings (especially the lack of namespaces) of the original Lisp. 1 (define (stack _ create) 2 nil 3) 4 5 (define (stack_push stack elem) 6 (if (list? Stack) 7 (cons elem stack) 8 else 9 NOT_A_STACK 10) 11) (define (stack_pop stack) 14 (if (empty? stack) 15 EMPTY_STACK 16 else 17 (cdr stack) 18) 19) (define (top stack) 22 (if (empty? stack) 23 EMPTY_STACK 24 else 25 (car stack) 26) 27) (define ( empty? stack) 1 The pragmatics did not stop at Lisp either and therefore there are the commands setcar and setcdr, which can change the status of a list php), strongly typed. 16 MK

25 2.5. Presentation of various programming languages ​​30 (i f (l i s t? S t a c k) 31 (n i l? S t a c k) 32 e l s e 33 NOT_A_STACK 34) 35) Source code 2.8: Implementation of a stack in t.lisp. With the help of the operators setcar and setcdr it would also be possible to develop a state-changing stack in Lisp. But this would be an unnatural approach in a highly functional programming language like Lisp. Source code 2.9 shows the sorter type which can be elegantly implemented in t.lisp. This is mainly due to the fact that this type does not have to manage any states. 1 (define (> = ab) (or (> ab) (= ab))) 2 (define (joinab) 3; chain two L ists: a. B 4 (cond 5 (nil? A) b 6 else 7 ( cons (car a) (join (cdr a) b)) 8) 9) 10 (define (select op pl) 11; S selects all elements inl which satisfy the binary predicate op. 12 (cond 13 ( nil? l) l 14 (op (car l) p) 15 (cons (car l) (select op p (cdr l))) 16 else 17 (select op p (cdr l)) 18) 19) 20 (define (qsortl) 21; Creates a new sorted version of l 22 (cond 23 (nil? l) l; The empty L isteistreadyssortier t 24 (nil? (cdr l)) l; The single element L is ready sorted 25 else 26; L inks <(carl) <= right 27 (join 28 (qsort (select <(car l) l)) 29 (cons (car l) (qsort (select> = (car l) (cdr l)))) 30) 31) 32 ) 33 (q s o r t ()) Source code 2.9: Implementation of QuickSort in t.lisp. 17 MK

26

27 3. TeaJay In this chapter the design process of TeaJay is documented and at the same time the resulting language is described. With a special focus on teaching, the effects of design decisions and how these relate to the design goals are discussed again and again. The design of programming languages ​​is a subjective topic and also depends heavily on the goals set. An objective assessment of whether a language is good or bad seems practically impossible. This begins with the discussion of whether a language should allow the user as much freedom as possible and thus allow the same subject matter to be formulated in many different ways, or whether a language should try to impose certain patterns on the user in order to thereby help him in his formulations to discipline. The authors try to strike a middle ground for TeaJay. All design decisions made for TeaJay are the result of discussion and consensus building between the authors of this work. At particularly controversial points, an attempt is made to give the reader an insight into this discussion. MB MK 3.1. Basic Design This section is intended to provide an overview of TeaJay's basic design. The following sections of this chapter then go into more detail. TeaJay's design is based primarily on the ADT concept and is intended to embed this in an object-oriented environment. In order not to be able to teach just a single programming paradigm with TeaJay, TeaJay is designed as a multi-paradigm language. TeaJay is mainly object-oriented, imperative and structured, but also supports functional and generic programming. This has the advantage that it is possible to teach the fundamentals of several paradigms without changing language. The decision to design TeaJay as a mainly object-oriented and imperative language is well considered, because on the one hand the imperative style has a special relationship to the everyday world and the everyday thinking of the students, and [on the other hand it supports] in its expansion to include object-oriented concepts that elementary cognitive processes of thinking, recognizing and problem-solving [Sch95, p. 3]. In addition, imperative languages ​​are very widespread in business, research and teaching. The Java language in particular served as a model for TeaJay. From the authors' point of view, Java, as a model for a teaching language, offers the following advantages: Java is a programming language that is very widespread, also in a business environment, and is in 2nd place on the TIOBE index (cf.

28 Chapter 3: TeaJay Java is syntactically related to the widely used languages ​​C (TIOBE index position 1) and C ++ (TIOBE index position 4). Java is one of the two languages ​​proposed in the specifications for the teaching requirements for the written examinations in the Abitur in the grammar school upper level in 2014 [Zen14] and in the curriculum for computer science of the State of Hesse [Leh10]. That is why Java has been used as a teaching language in many schools and universities for years and there are many beginners with prior knowledge of this language. An orientation from TeaJay to Java makes it easier for you to switch. Java is a multi-paradigm language and has automatic memory management. Java (Version 7) supports, except for the functional, all paradigms that TeaJay should also support 1. TeaJay has adopted or extended many syntactic properties from Java. Therefore, the Java grammar [GJS + 12, Chapter 18] serves as the starting point for the implementation of the TeaJay grammar. The biggest changes have been made to Java's type system: TeaJay does not support classes as they are known from Java, instead types and implementations. A TeaJay type behaves similarly to a Java class, but only contains the definition of its interface: This contains instance methods, static methods and constructors, but does not contain any implementation details. TeaJay thus implements the concept of the abstract data type very uncompromisingly. TeaJay makes it possible to create instances of types without the client having to know an implementation. In fact, in TeaJay it is not possible to instantiate implementations directly (with one exception, see section). TeaJay's type system has other special features: There are no primitive types (such as int in Java). In TeaJay there is only one class of types, namely reference types. There is no such thing as a classic type of array. TeaJay supports operator overload. TeaJay only knows one type of number display. This type can theoretically represent all rational numbers. In practice, the accuracy is limited by the available or addressable memory space. Through the above design decisions, the visible difference between built-in and user-defined types should be as small as possible. The decision to include only a single type of number representation in the language is primarily intended to benefit beginners. Furthermore, TeaJay's type system is generic and most of the versions of Java's generic type system applicable to TeaJay have been adopted. In addition, TeaJay does not have any unsafe automatic 1 It is also possible to program functionally in Java, but in a more indirect way. 20

29 3.1. Basic design type conversions. Due to the type system, there is not even a need for such. Since TeaJay is already semantically and syntactically oriented towards Java, it makes sense to translate TeaJay, just like Java, to the Java Virtual Machine [LYBB12], or JVM for short. The JVM also offers a number of advantages for this project: It is easy to load and dynamically link program code on it using the invokedynamic instruction and the Reflection API. Since generic and parameterized types are unknown to her, the decision to use the JVM has led to TeaJay's generic type system, just like that of Java, having to perform type deletion (see section). This is a purely technical admission and not a conscious design decision. The JVM also offers the advantage of taking care of an automatic garbage collection. In addition, the translation to a JVM also offers the advantage that TeaJay programs are platform-independent and can potentially be executed on all systems for which a JVM (version 7 or higher) exists. This is advantageous for teaching, as it is likely that learners will want to use TeaJay on their familiar system. Due to the strong separation of interface and implementation, TeaJay is supposed to promote the programming for the interface, according to the principle [p] programming towards an interface, not towards an implementation [EG96, p. 24].It is possible in different languages ​​to use libraries and reflection to achieve a similar separation of interface definition and implementation. However, this is always cumbersome and therefore requires a high level of discipline. TeaJay simplifies this type of programming considerably with its language resources. TeaJay's type system offers, in addition to imperative or object-oriented language elements, a functional language element, the closures (see Section 3.4.5). These make it easier to write functional programs, but they also have a pragmatic side: In TeaJay it is not possible to create anonymous type implementations, as this would undermine the strong separation of type and implementation. Closures are a simple way of encapsulating and passing on functionality, e.g. to respond to a callback. Just like Java, TeaJay does not support multiple inheritance, but takes over the interface concept of Java. In TeaJay, interfaces represent a partial type definition and should serve to extract very general functionality from types (as e.g. Iterable in Java does). The question of whether TeaJay should support operator overloading led to a controversial discussion between the authors during the design phase. It finally led to the concept of operator overload for TeaJay presented in Section. The main argument for this was given by the design goal of treating built-in and user-defined types as equally as possible. In order to do justice to TeaJay's strong orientation towards the concept of type, a language element called typeswitch (see section) was introduced in TeaJay. This not only simplifies the runtime check of types, but also makes them safer to handle, since it helps to avoid unchecked explicit type casts. However, TeaJay supports type casts, since typeswitch can only handle parameterized types to a limited extent. 21

30 Chapter 3: TeaJay MB 3.2. Hello World in TeaJay As an introduction, the classic example program HalloWelt is presented in TeaJay. An executable TeaJay program consists of at least a type definition and an implementation that implements the type definition. The type definition describes the interface that a type possesses. For the type HalloWeltProgramm, a folder hallowelt and in this a file with the name HalloWeltProgramm.tj must be created. A type is always a member of a package and this can be specified using the package keyword: 1 package h a l lo w e l t; 2 ... Source code 3.1: HalloWeltProgrammImpl.tj In TeaJay, the description of the interface of a type is introduced via the typedef keyword. It is optionally possible to mark this type as public: 1 package h a l l o w e l t; 2/3 D e f i n i e r t e i n H a l l o Welt Programm 4/5 public typedef HalloWeltProgram {} source code 3.2: HalloWeltProgrammImpl.tj Please note that the name of the file and the type definition must be identical. Since the type is to be executed, it requires a constructor and a method definition without parameters. The method definition can have any return type. The behavior of the method is specified via a comment: 1 package h a l l o w e l t; 2/3 D e f i n i e r t e i n H a l l o world program. 4/5 public typedef HalloWeltProgramm {6/7 A k z e p t i e r t no i n e command line parameters. 8/9 HalloWeltProgramm (); 10/11 Outputs the string "Hello World" on the console. 12/13 void run (); 14} Source code 3.3: HalloWeltProgramm.tj An attempt can now be made to execute the type. The type definition is first compiled with the TeaJay compiler tjc: 1 $ t j c h a l l o w e l t / HalloWeltProgramm. t j An attempt can then be made to execute the type with the TeaJay program starter tj. 1 $ t j h a l l o w o r l d. HalloWeltProgramm run 2 Please s t a t e an implementation f o r h a l l o w e l t. HalloWeltProgramm: 22 MB

31 3.2. Hello world in TeaJay tj expects to receive the name of a type, a method and optionally command line parameters. Since there is not yet an implementation for the type, the program launcher asks the user to give him the name of a specific implementation. A new file is created for the implementation of the type and named HalloWeltProgrammImpl.tj. So that a meaningful division of the type definition and implementation in TeaJay is possible, the implementation can belong to a different package. The implementation of the type is a member of the package implementations: 1 package implementations; 2 ... Source code 3.4: HalloWeltProgrammImpl.tj The file is located in the implementations folder and this is in the same directory as the Hello World folder. Because the implementation is in a different package than the type it implements, the type of implementation must be made known. This must be done via the import statement or, alternatively, the fully qualified name of the type can be used: 1 package implementations; 2 import h a l l o w e l t. HalloWeltProgramm; 3 ... Source code 3.5: HalloWeltProgrammImpl.tj An implementation is identified by the keyword typeimpl followed by its name. Then you have to use the keyword implements to specify which type the implementation implements: 1 package implementations; 2 import h a l l o w e l t. HalloWeltProgramm; 3 / S t e l l t e i n e I m p l e m e n t i e r u n g of the type h a l l o w e l t. H a l l o w e l t p r o g r a m m b e r e t / 4 typeimpl HalloWeltProgrammImpl implements HalloWeltProgramm {} source code 3.6: HalloWeltProgrammImpl.tj The interface of HalloWeltProgramm defines a constructor and the method run. This interface must be implemented by the implementation. The constructor can be implemented empty, since no initializations are carried out: 1 package implementations; 2 import h a l l o w e l t. HalloWeltProgramm; 3 / S t e l l t e i n e I m p l e m e n t i e r u n g of the type h a l l o w e l t. H a l l o w e l t p r o g r a m m b e r e i t / 4 typeimpl HalloWeltProgrammImpl implements HalloWeltProgramm {5 HalloWeltProgrammImpl () {6}} Source code 3.7: HalloWeltProgrammImpl.tj The method run is responsible for the output of Hello World. The IO.println (Object) method is called for this. This is known by default and therefore does not have to be imported or used via its fully qualified name. 1 package implementations; 2 import h a l l o w e l t. HalloWeltProgramm; 3 / S t e l l t e i n e I m p l e m e n t i e r u n g of the type h a l l o w e l t. H a l l o w e l t p r o g r a m m b e r e i t / 4 typeimpl HalloWeltProgrammImpl implements HalloWeltProgramm {5 HalloWeltProgrammImpl () {23 MB