prev up next   top/contents search

comp.lang.c FAQ list · Question 12.16

Q: How can I read data from data files with particular formats?
How can I read ten floats without having to use a jawbreaker scanf format
like "%f %f %f %f %f %f %f %f %f %f"?
How can I read an arbitrary number of fields from a line into an array?


A: In general, there are three main ways of parsing data lines:

  1. Use fscanf or sscanf, with an appropriate format string. Despite the limitations mentioned in this section (see question 12.20), the scanf family is quite powerful. Though whitespace-separated fields are always the easiest to deal with, scanf format strings can also be used with more compact, column oriented, FORTRAN-style data. For instance, the line
    	1234ABC5.678
    
    could be read with "%d%3s%f". (See also the last example in question 12.19.)
  2. Break the line into fields separated by whitespace (or some other delimiter), using strtok or the equivalent (see question 13.6), then deal with each field individually, perhaps with functions like atoi and atof. (Once the line is broken up, the code for handling the fields is much like the traditional code in main() for handling the argv array; see question 20.3.) This method is particularly useful for reading an arbitrary (i.e. not known in advance) number of fields from a line into an array.

    Here is a simple example which copies a line of up to 10 floating-point numbers (separated by whitespace) into an array:

    #define MAXARGS 10
    
    char line[] = "1 2.3 4.5e6 789e10";
    char *av[MAXARGS];
    int ac, i;
    double array[MAXARGS];
    
    ac = makeargv(line, av, MAXARGS);
    for(i = 0; i < ac; i++)
    	array[i] = atof(av[i]);
    
    (See question 13.6 for the definition of makeargv.)

  3. Use whatever pointer manipulations and library routines are handy to parse the line in an ad-hoc way. (The ANSI strtol and strtod functions are particularly useful for this style of parsing, because they can return a pointer indicating where they stopped reading.) This is obviously the most general way, but it's also the most difficult and error-prone: the thorniest parts of many C programs are those which use lots of tricky little pointers to pick apart strings.

When possible, design data files and input formats so that they don't require arcane manipulations, but can instead be parsed with easier techniques such as 1 and 2: dealing with the files will then be much more pleasant all around.


prev up next   contents search
about this FAQ list   about eskimo   search   feedback   copyright

Hosted by Eskimo North