Monday, August 28, 2006

How can I tell the difference between a text file and a binary file?

Short answer: You can't. The question does not make sense. A "text" file type is a subset of all binary file types. Let's make the question more generic: how do I tell the difference between file type X and a binary file (where X is any chosen file type (text, MP3, wav, etc.)? Or to put it in a different domain, how do I tell the difference between oak and wood? Now does it make a bit more sense why you can't ask this question? If not, read on...

A binary file contains binary data. Binary data is composed of bytes with values ranging from 0 to 255. Therefore, every file is a binary file. A file of any given type is a binary file which has a specific structure. It is simply a matter of convention and the interpretation of the structure of the data which differentiates one type of file from another. Many file types have specific values which are expected at particular byte offsets within the file. If these values are not found, then the file is not of that type. Of course, just because the file contains those bytes at the expected locations does not ensure the file is of that type, it just gives an indication that it might be.

The text file type typcially does not have this type of structure, an exception being some of the unicode standards. Some of these have an expected byte value in the first two bytes of the file. If these bytes exist, then it's assumed the file is a unicode text file.

All this said, for some uses, it might be possible to define, in a limited way, what it means to be a text file. One definition would be if the file contains anything other than byte values 9 (tab), 10 (new line), 13 (carriage return) or within the range of 32 to 127, then it is not a text file. The downside to this is that it eliminates the use of accented characters and does not include other control characters which might be included in some applications. The definition could be expanded to include the accented characters in the range 129 through 255. However, this now includes most of the range of bytes and might cause some false positives.

The bottom line is every file is a binary file. Every other type of file is a matter of interpretation of the binary data.

Friday, August 25, 2006

Interfaces 101 - Basics

This article discusses the basics of creating an interface, writing a class that implements the interface and some simple uses of the interface. For demonstration purposes, we'll define an interface that simply returns objects from some unknown data structure and has a boolean function to indicate if the end of the structure has been reached. This is commonly called an Iterator. In this article, we'll implement this interface with a class that takes a TList as a parameter in its constructor and iterates over the items in the list. In future articles, we'll probably use this same interface to demonstrate alternate and more advanced ways of implementing it.

Defining an interface

The first step in using interfaces is to define an interface. Typically in the interface section of a unit, it is similar to defining a class. The primary differences are that the interface keyword is used instead of the class keyword, and all that can be defined are functions, procedures and properties. No variable or constant declarations are allowed.

One thing unique about interfaces is they each should have a unique GUID. This identifier is used to find the interface when given an object or interface reference of a different type. While the GUID is optional, it is strongly recommended that all interfaces have one since its absence will make certain functions quietly fail. This can lead to a bit of head scratching. A GUID can be automatically generated in the IDE by pressing Control-Shift-G.

For example:
type
  IIterator = interface
    ['{DFA2FE47-053A-4F5C-BB30-8F2A8C6936EE}']
    function AtEnd: Boolean;
    function NextItem: TObject;
  end;

Creating an implementation

An interface by itself doesn't do anything. There needs to be a class that implements it. This is done most simply by creating a class that descends from TInterfacedObject and is flagged as implementing the interface. There are additional ways of doing this that will be discussed in another article, but for now we'll use this simple way.
type
  TListIterator = class(TInterfacedObject, IIterator)
  private
    fList: TList;
    fPosition: Integer;

  public
    constructor Create(const aList: TList);
    function AtEnd: Boolean;
    function NextItem: TObject;
  end;

constructor TListIterator.Create(const aList: TList);
begin
  inherited Create;
  fList := aList;
  fPosition := 0;
end;

function TListIterator.AtEnd: Boolean;
begin
  result := fPosition >= fList.Count;
end;

function TListIterator.NextItem: TObject;
begin
  result := nil;
  if fPosition < fList.Count then
  begin
   result := fList[aPosition];
    Inc(fPosition);
  end;
end;
The first line of the declaration says we're creating a class of type TListIterator that descends from TInterfacedObject and implements IIterator. It then says we've got two private variables: one will be used to store the current position and the other will store the list object we're iterating over. We have three methods: a constructor that takes the list as a parameter and initializes the two private variables, and implementations of the AtEnd and NextItem functions declared in the interface. All methods declared in an interface need to be implemented somewhere in the class hierarchy at or above the implementing class. Typically, they are implemented in the same class as the one implementing the interface, but they could be implemented in ancestor classes.

Using the implementation

Here are some methods from a test form. First a method to create a list for testing.
procedure TfrmInterfaces101.AfterConstruction;
var
  i: Integer;
begin
  inherited;
  cList := TObjectList.Create;
  cList.OwnsObjects := True;
  for i := 1 to 10 do
    cList.Add(TObject.Create);
end;
Now simply doing some clean-up housekeepping.
procedure TfrmInterfaces101.BeforeDestruction;
begin
  inherited;
  cList.Free;
end;
Now the use of the iterator.
procedure TfrmInterfaces101.btnTestClick(Sender: TObject);
var
  lIterator: IIterator;
begin
  lIterator := TListIterator.Create(cList);
  while not lIterator.AtEnd do
    memoResults.Lines.Add(Format('%p', [Pointer(lIterator.NextItem)]));
end;
Just as if normal class types are used, this method first declares a variable of the IIterator type. Next the variable is assigned a new instance of a class. This is where the first difference can be noticed; the object is instantiated with the TListIterator class but it's assigned to a variable of a different type. Objects that implement an interface, either directly or in an ancestor, can be directly assigned to variables of that interface type. Next there's a loop that uses the two methods of the interface to get each item in the list.

Finally, notice there's no freeing of the iterator. In general, when objects are assigned to an interface there's no need to free the implementing object. The object is freed when the last interface reference goes out of scope. So, in this case, the TListIterator is freed in the compiler generated code related to the "end" statement.

Thursday, August 3, 2006

What are interfaces?

Interfaces are a means of describing a unit of functionality without regard to how it is implemented. They provide a means of decoupling what an object does from how it does it. In some ways, they are similar to a pure abstract class definition.

A great example of a good use of interfaces is the Iterator interface in .NET. It is small and does one thing well: it provides the concept of a list of items without giving any detail about how that list is stored. Various types of container classes implement the Iterator interface, in different ways which are specific to how each container stores its data. Users of the Iterator interface can iterate through a collection of items without knowing if the collection is a linked list, a binary tree or coming from a remote machine through a TCP/IP socket (assuming there's some object which communicates through a socket and implements the Iterator interface).

In Delphi, they can also do two helpful things.

First, since an object can implement multiple interfaces, they can provide some of the useful features provided by multiple inheritance. For example, an object can implement an IStreamable interface, indicating that it knows how to stream itself to some streaming mechanism. It can also implement an IPerson interface indicating that it has Name, Home address and Birthdate properties. Interfaces implementations can also be delegated to properties. This allows a multiple classes with different object hierarchies to implement an interface but keep the implementation in a single helper class.

Second, they can provide a means of automatic lifetime management of objects. There's always the ongoing question of who's responsible for freeing created objects. Some environments, such as Java and .NET, use garbage collection which use various means of determining when an object is no longer used and getting rid of it. Historically, Delphi has said it's the programmers responsibility to call .Free when they are done with the object. Based on this, one best practice says the object which creates another object is responsible for freeing. Another is the owner pattern, the most familiar example is TComponent, where an owner is assigned who's responsible for freeing the object.

Reference counted interfaces are another means. Any object which descends from TInterfacedObject, or which implements IUnknown and incorporates reference counting, can use this method. Basically, it leaves the compiler and run-time system responsible for keeping track of how many references there are to an interface and destroying the implementing object when the last reference goes away.

This can be very handy in many situations. One example is the Factory pattern where the whole purpose is to decouple the creation of an object from the object using the created object. Another example is in threads, where objects may be passed between threads with no clear concept of owner. And finally, in lifetime management of normal objects, it can eliminate just busy-work housecleaning.