Saturday, September 3, 2016

Custom unstructured data types and types' reuse


In 1972 three well-known computer scientists wrote a book Structured Programming, where they mentioned about custom unstructured types:
All structured data must in the last analysis be built up from unstructured components, belonging to a primitive or unstructured types. Some of these unstructured types (for example, reals and integers) may be taken as given by programming language or the hardware of the computer. Although these primitive types are theoretically adequate for all purposes, here are strong practical reasons for encouraging a programmer to define his own unstructured types, both to clarify his intentions about the potential range of values of a variable, and the interpretation of each such value; and to permit subsequent design of an efficient representation.
(...) Such a type is said to be an enumeration, and we suggest a standard notation for the name of the type and associating a name with each of its alternative values.
type suit = (club, diamond, heart, spade);
          (...)
type year = 1900 .. 1960; 
type coordinate = 0 .. 1023;
(...) We therefore introduce the convention that a .. b stands for the inline range of values between a and b inclusive. This is known as subrange of the type to which a and b belong, (...)
Ole-Johan Dahl, Edsger W. Dijkstra, C.A.R. Hoare, Structured Programming, A.P.I.C. Studies in Data Processing, No. 8, 1972, p.97

Some of the languages indeed support subrange types, e.g. Ada and Dephi. Nevertheless, modern popular languages like Java and C# do not support unstructured custom subrange types.

In addition to this there is a huge misconception about a type declaration and its actual meaning. For instance, an Array accepts an Integer as index argument with a range [-2^31, 2^31], so we declare support for -1,-2, ... index values. At the same time only non-negative integers are accepted, i.e. [0, 2^31]. For an honest and clear code it should be an Array[NonnegativeInteger].

Moreover, commonly used structured and unstructured types does not limit to numbers. Nowadays an email is always presented as String:
function SendEmail(String email, String emailBody){ ... }

but we actually mean
function SendEmail(EmailAddress email, String emailBody){ ... }

Making an assertion that an instance of the class/structure EmailAddress is always valid we do not need to check the string every time with IsEmail() helper or an attribute of DataAnnotations. It can have a function EmailAddress TryConvert(String email) for convenient conversion from string.

Other common places for apparent improvements:

  • Name checking of a String for a Firstname and a Lastname across whole solution with the same restrictions, probably with no more than 256 special characters and at least 1 character, can be changed to what a programmer really means: class/structure Name.
  • class Description with common rules for contact messages, order's comments, feedback messages, etc.
  • Measures: degrees, coordinates, weight, etc.
  • Date and time Integers up to 2147483648 (2^31) for a year, a month, a day, an hour and a minute look inconvenient for real use and most of the systems after all will throw an exception by taking large numbers for these types.
  • Money There is no a banknote with minus $100 value, but there can be a notion of 100$ debt.

With the help of Design by Contracts most of the checks can be done during compilation time.
For interoperability with other systems custom types should be built up from primitive and simple types, like integers and strings, and be convertible from these types.