Short answer: You can't. The question does not make sense. A "text" file type is a subset of all binary file types. Let's make the question more generic: how do I tell the difference between file type X and a binary file (where X is any chosen file type (text, MP3, wav, etc.)? Or to put it in a different domain, how do I tell the difference between oak and wood? Now does it make a bit more sense why you can't ask this question? If not, read on...
A binary file contains binary data. Binary data is composed of bytes with values ranging from 0 to 255. Therefore, every file is a binary file. A file of any given type is a binary file which has a specific structure. It is simply a matter of convention and the interpretation of the structure of the data which differentiates one type of file from another. Many file types have specific values which are expected at particular byte offsets within the file. If these values are not found, then the file is not of that type. Of course, just because the file contains those bytes at the expected locations does not ensure the file is of that type, it just gives an indication that it might be.
The text file type typcially does not have this type of structure, an exception being some of the unicode standards. Some of these have an expected byte value in the first two bytes of the file. If these bytes exist, then it's assumed the file is a unicode text file.
All this said, for some uses, it might be possible to define, in a limited way, what it means to be a text file. One definition would be if the file contains anything other than byte values 9 (tab), 10 (new line), 13 (carriage return) or within the range of 32 to 127, then it is not a text file. The downside to this is that it eliminates the use of accented characters and does not include other control characters which might be included in some applications. The definition could be expanded to include the accented characters in the range 129 through 255. However, this now includes most of the range of bytes and might cause some false positives.
The bottom line is every file is a binary file. Every other type of file is a matter of interpretation of the binary data.
Welcome and thanks for visiting! You might be interested in subscribing to our RSS or Twitter article feeds.
If you prefer e-mail, you can subscribe by putting you e-mail here. This is never used for anything but letting you know when articles are published and you can opt out at any time.
If you prefer e-mail, you can subscribe by putting you e-mail here. This is never used for anything but letting you know when articles are published and you can opt out at any time.
Welcome back! We're so glad you enjoy our writing. If you especially like a particular article, please consider sharing it using the button at the bottom of each article. Thanks!
Monday, August 28, 2006
How can I tell the difference between a text file and a binary file?
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment