Type Analysis: Basic Concepts
In most programming languages, objects are declared in one part of the program and used in others. We’ve already mentioned that scope analysis is the calculation of the function that assigns to each use of an object the definition that applies to it.
Type analysis aims to ensure that the use of defined objects is correct: that is, that their use adheres to the semantics of their definition; for example:
- that an array of integers is not called as a function or
- that a function is not attempted to be incremented or
- that the value returned by a function is of the nature described in its definition.
Type Expressions
An appropriate way to represent types within a compiler is by using a language of type expressions.
A language of type expressions should describe the types of the source language in a clear and simple manner. Do not confuse this language with the sub-language of the source language that consists of declarations or definitions.
They don’t have to be the same. The compiler translates type declarations into type expressions.
The language of type expressions is the internal representation that the compiler has of these declarations and depends on the compiler.
Type System
A Type System of a language/compiler is the set of language rules that allows assigning type expressions to instances of use of program objects.
While the type system is a property of the language, it is not uncommon for compilers to introduce modifications to the language’s type system.
For example, in Pascal, the type of an array includes the array indices. This and Pascal’s type equivalence rules severely limited the genericity of functions in Pascal. For this reason, some Pascal compilers allowed compatibility of types between arrays of different sizes and different index sets in a function call. Unfortunately, the way they did this differed from compiler to compiler.
Type Checker
A Type Checker verifies that the use of objects in usage constructs adheres to what is specified in their declarations or definitions according to the rules specified by the type system.
Static Typing and Dynamic Typing
A programming language has static typing if its type checking occurs at compile time without having to check equivalences at runtime.
A programming language has dynamic typing if the language performs type checks at runtime.
In a dynamic type system, types are usually associated with values not with variables.
Strong Typing and Weak Typing
Although the meaning of the terms Strongly Typed and its opposite Weakly Typed vary with authors, there seems to be consensus that languages with strong typing usually have some of these characteristics:
- Compile-time checking of violations of restrictions imposed by the type system. The compiler ensures that for any operations the operands have valid types.
- Any operation on invalid types is rejected either at compile time or runtime.
- Some authors consider that the term implies disabling any implicit type conversion. If the programmer wants a conversion, they must explicitly specify it.
- The absence of ways to bypass the type system.
- That the type of a data object does not vary during the life of the object. For example, an instance of a class cannot have its class altered during execution.
Sobrecarga, Polimorfismo e Inferencia de Tipos
Sobrecarga
Un símbolo se dice sobrecargado si su significado varía dependiendo del contexto. En la mayoría de los lenguajes Los operadores aritméticos suelen estar sobrecargados, dado que se sustancian en diferentes algoritmos según sus operandos sean enteros, flotantes, etc.
En algunos lenguajes se permite la sobrecarga de funciones. Así es
posible tener dos funciones llamadas min
:
int min(int a, int b) { string min(string a, string b) {
if (a < b) return a; if (strcmp(a, b) < 0) return a;
return b; return b;
} }
A la hora de evaluar el tipo de las expresiones es el contexto de la llamada el que determina el tipo de la expresión:
float x,y;
int a,b;
string c,d;
u = min(x,y); /* Puede que correcto: x e y seran truncados a enteros. Tipo entero */
v = min(a,b); /* Correcto: Tipo devuelto es entero */
w = min(c,d); /* Correcto: Tipo devuelto es string */
t = min(x,c); /* Error */
¿Como afecta al análisis de ámbito la sobrecarga de operadores?
Overloading, Polymorphism, and Type Inference
Overloading
A symbol is said to be overloaded if its meaning varies depending on the context. In most programming languages, arithmetic operators are often overloaded, as they are implemented with different algorithms depending on whether their operands are integers, floats, etc.
In some languages, function overloading is allowed. Thus, it is possible to have two functions named min
:
int min(int a, int b) { string min(string a, string b) {
if (a < b) return a; if (strcmp(a, b) < 0) return a;
return b; return b;
} }
When evaluating the type of expressions, it is the context of the call that determines the type of the expression:
float x,y;
int a,b;
string c,d;
u = min(x,y); /* May be correct: x and y will be truncated to integers. Integer type */
v = min(a,b); /* Correct: Returned type is integer */
w = min(c,d); /* Correct: Returned type is string */
t = min(x,c); /* Error */
Think 🤔: How does function overloading affect scope analysis?
Type Inference
Type inference refers to algorithms that automatically deduce at compile time - without additional information from the programmer, or with partial annotations from the programmer - the type associated with a use of an object in the program.
A good number of functional programming languages allow for type inference (Haskell, OCaml, ML, etc.).
Type inference refers to the process of determining the appropriate types for expressions based on how they are used.
For example, in the expression y(4)
, OCaml
knows that
y
must be a function, because it is applied to something and that s.t. takes anint
as input.- It knows nothing about the output type.
Therefore the type inference mechanism of OCaml would assign
y
the type int -> 'a
where 'a
is a type variable.
➜ apuntes-pl git:(main) ✗ ocaml
# fun y -> y(4) ;;
- : (int -> 'a) -> 'a = <fun>
# let g = fun x -> x+1;;
val g : int -> int = <fun>
# (fun y -> y(4)) g ;;
- : int = 5
# let rec rh n = if n <= 0 then "" else "hello" ^ rh (n - 1);;
val rh : int -> string = <fun>
# (fun y -> y(4)) rh ;;
- : string = "hellohellohellohello"
As an example of type inference, consider the following session in OCaml:
pl@nereida:~/src/perl/attributegrammar/Language-AttributeGrammar-0.08/examples$ ocaml
Objective Caml version 3.09.2
# let minimo = fun i j -> if i<j then i else j;;
val minimo : 'a -> 'a -> 'a = <fun>
# minimo 2 3;;
- : int = 2
# minimo 4.9 5.3;;
- : float = 4.9
# minimo "hola" "mundo";;
- : string = "hola"
The OCaml
compiler infers the type of the expressions.
Thus, the type associated with the function min
is 'a -> 'a -> 'a
,
which is a type expression that contains type variables.
The operator ->
is right associative, so the expression should be read as 'a -> ('a -> 'a)
and not as ('a -> 'a) -> 'a
.
Basically, it says:
minimo
is a function that takes an argument of type 'a
(where 'a
is a type variable that will be instantiated at the time of the function’s use)
and returns a function that takes elements of type 'a
and returns elements of type 'a
.
Functions of Several Variables versus Functions that Return Functions
Although it might be thought that a more appropriate description of the type of the function minimo
would be 'a x 'a -> 'a
, the truth is that in some functional languages it is common for all functions to be considered as functions of a single variable.
The function of two variables 'a x 'a -> 'a
can be seen as a function 'a -> ('a -> 'a)
.
Indeed, the function minimo
when it receives an argument returns a function:
# let min_mundo = minimo "mundo";;
val min_mundo : string -> string = <fun>
# min_mundo "pedro";;
- : string = "mundo"
# min_mundo "antonio";;
- : string = "antonio"
# min_mundo 4;;
This expression has type int but is here used with type string
# min_mundo(string_of_int(4));;
- : string = "4"
This strategy of reducing functions of several variables to functions of a single variable that return functions of a single variable is known as currying or partial application.
Polimorfismo
El polimorfismo es una propiedad de ciertos lenguajes que permite una interfaz uniforme a diferentes tipos de datos.
Se conoce como función polimorfa a una función que puede ser aplicada o evaluada sobre diferentes tipos de datos.
Un tipo de datos se dice polimorfo si es un tipo de datos generalizado o no completamente especificado. Por ejemplo, una lista cuyos elementos son de cualquier tipo.
Polimorfismo Ad-hoc y polimorfismo paramétrico
Se llama Polimorfismo Ad-hoc a aquel en el que el número de combinaciones que pueden usarse es finito y las combinaciones deben ser definidas antes de su uso.
Se habla de polimorfismo paramétrico si es posible escribir el código sin mención específica de los tipos, de manera que el código puede ser usado con un número arbitrario de tipos.
Por ejemplo, la herencia y la sobrecarga de funciones y métodos son mecanismos que proveen polimorfismo ad-hoc.
Los lenguajes funcionales, como OCaml suelen proveer polimorfismo paramétrico.
En OOP el polimorfismo paramétrico suele denominarse programación genérica
En el siguiente ejemplo en OCaml construimos una función similar al map
de Perl.
La función mymap
ilustra el polimorfismo paramétrico:
la función puede ser usada con un número arbitrario de tipos, no hemos tenido que hacer ningún tipo de declaración explícita y sin embargo el uso incorrecto de los tipos es señalado como un error:
# let rec mymap f list =
match list with
[] -> []
| hd :: tail -> f hd :: mymap f tail;;
val mymap : ('a -> 'b) -> 'a list -> 'b list = <fun>
# mymap (function n -> n*2) [1;3;5];;
- : int list = [2; 6; 10]
# mymap (function n -> n.[0]) ["hola"; "mundo"];;
- : char list = ['h'; 'm']
# mymap (function n -> n*2) ["hola"; "mundo"];;
This expression has type string but is here used with type int
En los apuntes del profesor en Análisis de Tipos de Funciones Polimorfas se muestra un pequeño lenguaje que permite el polimorfismo paramétrico. Está escrito en Perl usando Eyapp. Esta es la gramática Eyapp basada en la sección A language with Polymorphic functions del libro de Aho, Sethi y Ullman, 1986, p. 367:
p: d <+ ';'> '{' e <%name EXPS + ';'> '}'
;
d: id ':' t
;
t: INT
| STRING
| TYPEVAR
| LIST '(' t ')'
| t '*' t
| t '->' t
| '(' t ')'
;
e: id '(' optargs ')'
| id
;
optargs:
/* empty */
| args
;
args: e
| args ',' e
;
id: ID
;
Y este es un ejemplo de programa conforme a la gramática Eyapp anterior:
pl@nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Aho-Polymorphism/lib/Aho/script$ \
cat -n prueba01.ply
1 first : list(ALPHA) -> ALPHA;
2 q : list(list(int))
3 {
4 first(first(q))
5 }
Equivalencia de Expresiones de Tipo
La introducción de nombres para las expresiones de tipo introduce una ambiguedad en la interpretación de la equivalencia de tipos. Por ejemplo, dado el código:
typedef int v10[10];
v10 a;
int b[10];
¿Se considera que a
y b
tienen tipos compatibles?
Equivalencia de tipos estructural y equivalencia de tipos nominal
Equivalencia de tipos estructural
Se habla de equivalencia de tipos estructural cuando los nombres de tipo son sustituidos por sus definiciones y la equivalencia de las expresiones de tipo se traduce en la equivalencia de sus árboles sintácticos o DAGs.
Equivalencia de tipos nominal
Si los nombres no son sustituidos se habla de equivalencia por nombres o de equivalencia de tipos nominal.
Si utilizamos la opción de sustituir los nombres por sus definiciones y permitimos en la definición de tipo el uso de nombres de tipo no declarados se pueden producir ciclos en el grafo de tipos.
El lenguaje C impide la presencia de ciclos en el grafo de tipos usando dos reglas:
-
Todos los identificadores de tipo han de estar definidos antes de su uso, con la excepción de los punteros a registros no declarados
-
Se usa equivalencia estructural para todos los tipos con la excepción de las
struct
para las cuales se usa equivalencia nominal
Por ejemplo, el siguiente programa:
nereida:~/src/perl/testing> cat -n typeequiv.c
1 #include <stdio.h>
2
3 typedef struct {
4 int x, y;
5 struct record *next;
6 } record;
7
8 record z,w;
9
10 struct recordcopy {
11 int x, y;
12 struct recordcopy *next;
13 } r,k;
14
15
16 main() {
17 k = r; /* no produce error */
18 z = w; /* no produce error */
19 r = z;
20 }
Produce el siguiente mensaje de error:
nereida:~/src/perl/testing> gcc -fsyntax-only typeequiv.c
typeequiv.c: En la función 'main':
typeequiv.c:19: error: tipos incompatibles en la asignación
En lenguajes dinámicos una forma habitual de equivalencia de tipos es el tipado pato:
Duck typing
Duck typing is a form of dynamic typing where the set of methods and properties of an object determine the validity of its use. This means:
Two objects belong to the same duck type if they implement/support the same interface, regardless of whether or not they have a relationship in the inheritance hierarchy.
The term refers to the so-called duck test: If it waddles like a duck, and quacks like a duck, it’s a duck!.
Type Conversion
The type checker of a C compiler can modify the AST to introduce conversions where necessary. For example, if we have the following C program:
int i;
float x;
x+i;
Given the expression tree
PLUS(VAR, VAR)
,
the type checker will introduce an intermediate node INT2FLOAT
to indicate the need for the conversion and specify the type of PLUS
being used:
PLUSFLOAT(VAR, INT2FLOAT(VAR)).
An optimization tree transformation that comes into play here is the compile-time type conversion of constants. For example, given the two programs:
float X[N]; float X[N];
int i; int i;
for(i=0; i<N; i++) { for(i=0; i<N; i++) {
X[i] = 1; X[i] = 1.0;
} }
The performance effects will be dire
if the compiler does not perform the conversion of the integer constant
1
in the program on the left at compile time but instead leaves the
conversion to a conversion subroutine that is called at runtime.
In such a case, the performance of the programs on the left and right
would be completely different.
References
- Apuntes del Profesor sobre Análisis de Tipos. 2000-2012.