Fast Reading and Writing of Binary Data in C

Scientific programming often requires reading and writing large chunks of data. This can often be one of the most time-consuming parts of the code, especially when data files break the GB mark.

Instead of using nested loops with one scanf statement per data point, try to use unformatted files and the low-level read function. You can read entire data files an order of magnitude or two faster this way. Allocate your array storage with malloc and then pass a pointer to that memory and the total number of bytes to be read to read. Then call read once to read the entire data file into memory.

This method may seem messy, as it removes the human readable step in data handling, but the operating system is good at deciding how much data to read at a time and how to best optimize the transfer from disk to RAM. When you pass one, big read call for the whole data file to the operating system, the entire file is poured into memory almost as fast as your hardware will allow. This can be north of 1 GB/s in many machines. Calling scanf for each data point or array value is slow because scanf then calls read anyway, and you are forcing the computer to get one, store one, instead of get many, store many.

The same principle applies to writing data, except you call write for the whole array at one time, instead of looping through and calling printf each time.

Here is a small example to demonstrate a fast file read:

#include <stdio.h>
#include <stdlib.h>
#include <err.h>
#include <fcntl.h>
#include <sysexits.h>
#include <unistd.h>

   int fd; 
   size_t bytes_read, bytes_expected = 100000000*sizeof(double); 
   double *data;
   char *infile = "file.dat";

   if ((fd = open(infile,O_RDONLY)) < 0) 
      err(EX_NOINPUT, "%s", infile);

   if ((data = malloc(bytes_expected)) == NULL)
      err(EX_OSERR, "data malloc");

   bytes_read = read(fd, data, bytes_expected);

   if (bytes_read != bytes_expected) 
      err(EX_DATAERR, "Read only %d of %d bytes", 
         bytes_read, bytes_expected);

   /* ... operate on data ... */



Update: I've been told that on Linux this only works for files smaller than 2.1 GB, even on 64-bit platforms, due to a "feature" of the kernel read function. If you need to read files larger than this, you'll have to break them into chunks this size or less.