Embedded C: 2007

Assembly vs. C

5:19 AM with 0 comments »

Given a particular programming problem, what language should you use to realize the solution? Your choice could very well affect the success or failure of the project. So you'd better choose wisely.

Have you chosen the best programming language for your current embedded project? Your answer may very well depend on how the phrase "best programming language" is defined. Perhaps you've chosen the language that produces the most efficient code for your particular processor; or maybe you've selected the one that will allow you to finish the coding most quickly or with fewer bugs. Then again, maybe you--like so many of us--are just using the same language you always have.

Best fit

Even considered within the narrow scope of embedded systems, the decision of what language to use to implement the solution to a given programming problem is a difficult one. Many factors must be considered and different weights given to each of them.

The factors relevant to a language decision probably include at least:

Efficiency of compiled code
Source code portability
Program maintainability
Typical bug rates (say, per thousand lines of code)
The amount of time it will take to develop the solution
The availability and cost of the compilers and other development tools
Your personal experience (or that of the developers on your team) with specific languages or tools

All of these factors are important. Yet, in all of my years as a software developer, I have never once worked with any engineer or manager who gave either discussion or serious thought to the matter of language selection. Nor have I ever seen a formal "language selection process" documented at any company. The majority of embedded programmers just seem to assume that (almost invariably assembly or C when you're talking to embedded folks) is always the right choice. The fact that some of them always pick assembly and others C is one clue that the language decision is more complicated than they would have the rest of us believe.

During my years as an embedded developer, I've broken many of the conventional language assumptions in this field‹with great success. I've used assembly language on occasion, but only once to improve the efficiency of a routine beyond what could have been done by an off-the-shelf C compiler. I've used C on memory-constrained eight-bit processors. Even more surprisingly, perhaps, I've made use of many parts of the C++ language without ever once paying a performance penalty. Heck, I've even written a piece of embedded software in Java (complete with one of those "too big and too slow for use in any embedded system" Java virtual machines). All of these things have been done in the process of creating real products that worked and were sold, and were mostly completed on schedule.

I'm absolutely convinced that matching your choice of programming language to the computing task at hand is essential. If you choose correctly, you will be rewarded with a straightforward design and an even easier implementation. If you choose wrong, you may run into problems at one of those points or, worse, during the subsequent integration and/or testing phase. A bad language choice may not keep you from finishing, but it could cause a lot of headaches along the way.

I think a big part of my success on past projects has been a result of good language choices. Unfortunately, I must admit to feeling that this may be a combination of experience, talent, and luck that can't be translated into a formal decision-making process. Too many variables and too many project-specific issues must be considered.

So, when asked to moderate a birds-of-a-feather session at last year's Embedded Systems Conference in San Jose, I chose language selection as the topic for discussion. I wanted to see if there was knowledge in my brain and in the brains of others that could be used to instruct people in the ways of language selection. This open forum provided an excellent opportunity to bring such knowledge out into the open. The discussion was lively and interesting, but did not lead to any major revelations. Since then, I've continued to give this issue thought from time to time.

I haven't yet been able to formalize my thinking on language selection. But I'm hoping that someday soon I will be able to. In the meantime, I'd like to tell you about a project that gave me reason to think as carefully as ever about language selection and the results of a related experiment.

Mission impossible

Your mission, should you choose to accept it...

While working as a consultant, you receive a call from an old colleague. It seems that a small local company, XYZ, Inc., has gotten itself into serious trouble with a new product development. They are in desperate need of someone to straighten out their embedded software. You express interest in the work, thank your friend, and make a quick call to the engineering manager at XYZ.

It turns out that XYZ has not traditionally been in the product development business. They've been manufacturing, selling, and supporting the same product designs for more than a decade. However, about two years back they realized that they would not be able to get all of the parts to build their flagship product for much longer (including one arcane eight-bit processor). If they were to stay in business, XYZ's owners realized, they would have to redesign this product and start manufacturing the new design by a certain date.

Shortly thereafter, an electrical engineer (fresh out of a local university) was hired to design the embedded hardware and software for the new design. He spent about a year learning his way around their old system, researching their options, and designing the new hardware. Then he spent another year writing 2,000 lines of assembly code to control the system. (An abominable one instruction per hour.) A few weeks before the first shipment of this new product was supposed to take place, the engineer (now with two years of "experience") accepted a new job and left the company.

In the wake of their loss, XYZ's management realized that the new product was far from shippable. Despite the incredible amount of time spent on the software development, a fair number of bugs remained to be worked out of the firmware. The list of bugs even included problems like occasional system lock-ups that were indicative of major design flaws or implementation errors. In addition to that, several issues related to product performance (not bugs, per se) still needed to be addressed. No one else at the company knew anything about the assembly code and few notes or comments could be found in the engineer's working areas.

Despite a backlog of half a million dollars, XYZ may be on the verge of going out of business. The new design is not working and no parts are available to build the old one. Your new client wants you to attempt either a fix of the existing assembly code (the engineering version of a Hail Mary play, given their list of bugs) or a complete software redesign and implementation. Which would ultimately be faster? And, if you chose the latter route, what language should you use? In short, your client's existence as a company and the fate of its 40 employees may well be riding on your decision. So you'd better make a good one.

Decisions, decisions

Not so long ago, I was put in this very situation. The pressure to make the correct decision was, obviously, quite palpable. Because of the risks, I sought to be as logical and formal as I ever had been and to include the client in the entire process.

The options at our disposal seemed to be:

Fix the existing firmware. The biggest risk here was that this might not even be possible. The fact that the original author had gotten 80% of the functionality mostly working did not imply that the bug fixes and remaining 20% could be built around his infrastructure. I had encountered situations in the past in which software was so poorly designed that it was essentially beyond repair. I thought this might be another of them.
Redesign and implement the firmware. This was more of a sure thing (I could definitely make the system work this way), but it might take longer than fixing the existing code. And I'd probably have some bugs of my own. Who knew what would take longer? And, if I were going to rewrite the software, what language should I choose?
- Assembly. I've found that assembly language is rarely the correct choice. It suffers from a lack of portability and maintainability. Also, I wasn't familiar with this particular assembly language myself, so the learning curve might slow down my work.
- C. The problem here was that the target processor was an eight-bit Microchip PIC16Cxx with just 8Kwords of ROM and 368 bytes of RAM. Before I could make a language decision, I'd have to figure out how much of an impact using C might have on the image size and program efficiency. Were there enough memory and instruction cycles to spare? And, if I chose C, whose compiler should I use?
- Others. I didn't think any other languages were worth considering for this project

To get a feel for the existing code, I decided to fix one of their minor bugs before making any decisions. This took 22 hours in all. A large chunk of time for one minor bug, but I'd learned a lot in the process. I'd learned what their existing development environment was like (and installed the tools on my computer), and I'd learned that the original author of the software didn't understand PIC assembly much better than I did. I also inferred‹from the structure of his code‹that he had no master plan at all.

By this point, I was leaning toward a complete redesign of the firmware. Judging from their list of problems and the state of the code, I wasn't sure if I'd ever be able to make it do what they needed. However, I thought I understood what the system had to do now and was pretty certain that the man-year the original programmer had spent on this was way too long. My gut told me this was a two-month project‹max. I didn't want to waste time trying to fix the existing code if I was just going to end up rewriting it after all. A couple of wasted weeks might cause serious financial difficulty at XYZ. So I was going to play it safe and start over from scratch. Now I just had to figure out what language to use.

Assembly or C?

It turned out that one discussion I'd had with the client was the ultimate decision-maker. XYZ's management didn't like that the original programmer had spent a year developing a program that no one else at the company could understand. From what they'd heard, having the program written in C would make it both more portable (should the processor ever need to be changed again) and maintainable (far more C programmers exist than PIC assembly programmers, by any measure). They were right on both accounts.¹

I was also quite sure that writing the program in C would be much faster than writing it in assembly. However, I needed to give several issues more thought before fully committing to C. I needed to be sure that I wouldn't encounter any actual size or efficiency surprises. My experience has been that these are not limiting factors in 99% of all cases, but an 8-bit processor with 368 bytes of RAM and an odd-ball call stack was outside the scope of my experience base.

A little Web-based research showed that at least three C compilers were available for the PIC16 family. I downloaded demo versions for two of them and ran some experiments. I wouldn't need to make any library calls, so the extent of the impact of using C would be:

Extra RAM usage associated with more frequent function calls (my program would be more structured than the assembly version), if any.
Extra ROM usage associated with the C startup code, compiler-supplied routines (for example, the PIC16 family doesn't have a multiply instruction), and overhead introduced by the compiler, if any.
Any performance impact associated with extra instructions that can't be avoided when generalizing at the compiler level.

It turned out that the extent of these three factors was relatively minor. The compiler would create overlays to minimize RAM usage. The C startup code was very small. Multiply routines would indeed add a few hundred instructions‹but, then, the current program had to implement a primitive multiply function too. The existing program contained about 2kwords of code and 4K of look-up tables, in an 8K ROM. Assuming I'd need those look-up tables (it turned out that I didn't), that still left me room to make my program twice as big as the assembly version. I deemed that the worst case, given the requirements and my experiments, and selected C as my language.

Tortoise and hare

This is where things got really interesting. After better understanding the trade-offs between fixing their existing program and developing a new one (and having been burned once), management at XYZ didn't want to take any chances with this. We agreed that a second consultant would be sought to fix the existing code in parallel with my efforts to create a new program. For the first time in my career, I would have a chance to test an engineering decision in a real-world competition. Had I made the right choice in deciding not to fix the existing code? Would I be praised as a hero or burned in effigy?

Sure enough, another consultant (a PIC assembly expert, no less) was located within a few weeks. He didn't have as much time to devote to the project as I did, but the results were still instructive. It turned out that my new C program was ready before the assembly program had even been fully repaired. Hours worked were tracked on both sides and, at the point that we shipped the C code, here's how the numbers broke down:

Fixing the existing code (including my minor bug fix): 86 hours
Rewriting the code in C (including design and performance improvements): 185 hours

After 86 hours of repair, the assembly program was growing short on bugs. The other consultant had done an amazing job of eliminating the system lock-ups and almost all of the other flaws. I was impressed. However, none of that software's performance problems had yet been addressed. From my experience with the C code, these performance "tweaks" required a major rethinking of the design. I'm still not sure if the existing program could have been restructured to handle this paradigm shift or not.

After just 185 hours of designing, coding, testing, and debugging, the new firmware was ready to be shipped. It included all of the needed performance enhancements and pieces of it were later easily ported to another, similar piece of hardware. The C implementation was also modular, commented to be easily understood by others, and had even passed a thorough code inspection. All of that in about 10% of the time it took to write the original (bug-riddled) assembly program. Clearly, this program should have been written in C in the first place.²

More surprises

A major chunk of time I didn't include in the above figure was 26 hours I spent working with an XYZ engineer to debug a race condition in his software. This race condition was exposed when my C code was unable to communicate with a remote system with which the assembly code had no trouble exchanging information. After two long days in the lab, we finally determined that the true source of the problem was that the C code was twice as fast to respond to a sync command than the assembly code had ever been. With a shorter latency in the communications channel, a race condition in the remote software was exposed. Surprise, surprise, surprise... C had beaten assembly in terms of program efficiency.³

But that wasn't the only surprise result on this project. Another was that the new firmware image was actually smaller than the old one. Recall that the original assembly program consisted of about 2kwords of code and 4K of look-up table data. The C program contained about twice as much code (4kwords), but the lookup table data was eliminated in its entirety. Compiler-supplied mathematical routines made it much more natural to use equations to fit the necessary curves. These calculations did not need to be performed especially quickly, and the use of equations made calibration of the system significantly more flexible. The system's performance was thus improved at the same time that its memory utilization was reduced and the code made more readable and portable!

Choosing a language more tuned to the problem at hand made the system design easier, the implementation faster (by a factor of 10), and the whole project run more smoothly. The end result was a program that could be easily maintained and/or ported to new hardware.

I would love to hear your thoughts, good and bad experiences, and other comments in the area of language selection. So, if you have two cents to offer on the subject, please comment below. Perhaps together, we can nail down some objective criteria for selecting from various languages.

Footnotes

[1] In fact, they'd wanted it written in C in the first place. But the engineer they'd hired decided on assembly, apparently because C scared him. He convinced them to go along by saying vague things like "C is too big" and "C is too slow." [back]

[2] To be fair, I did benefit from the work of the original programmer. In certain instances, I was able to borrow the algorithm for an essential peripheral interaction from his code, rather than reconstruct the sequence from a databook. If I hadn't had access to these working parts of the code, I probably would have spent about 100 more hours on the project (50 of them staring at a logic analyzer). [back]

[3] The speedup was actually the result of a better design. Assembly will always be at least as fast as C when executing the same algorithm.

Embedded Internet Protocol (IP)

5:18 AM with 0 comments »

The Internet Protocol (IP) is the glue that holds an internet together. Here's a compact implementation of the IP layer for embedded C programmers.

As we've seen, the presence of a network interface like an Ethernet controller in a product makes it possible to send packets of data from one computer to another. To send the data, however, a destination hardware address must be attached to the packet before it is streamed out onto the network. ARP provides a mapping service between the hardware address of the destination computer and its IP address. So despite the fact that they are not part of the IP layer, both a network interface and ARP are required for network communications.

IP freely

The IP layer's main role is to provide for the routing of packets from one IP address to another. In other words, it can connect any machine on the Internet with any other, regardless of physical location. One of the communicating machines may be connected to an Ethernet LAN at a company in China, while the other could be a laptop in Maine that has just connected with a modem and a PPP link. Because of IP, their physical locations and underlying network interfaces don't matter. The IP layer software on each machine and similar software within routers and switches on the Internet backbone make this end-to-end communication possible.

The result of all this is a logical connection between two computers. This connection can be used to send and receive all kinds of data. For example, the two computers could exchange Web pages via HTTP over TCP, SNMP packets over UDP, telnet sessions, voice over IP, or any other type of data that can be sent in a stream of one or more digital packets.

Because so many types of data can be exchanged over IP, the Internet Protocol is full of features that are only used by a subset of connected hosts. In fact, the protocol itself is overdue for an upgrade from the current version, called IPv4, to the more capable IPv6. Since IPv6 is not yet widely deployed and even many of the IPv4 features are not necessary for a lot of applications, my implementation of the IP layer includes only a sort of bare-minimum of functionality to satisfy IPv4-style communications. Most commercial TCP/IP stacks for sale in the embedded systems marketplace are more complete.

To get a feel for what is and isn't supported, let's look at the data structures and constants defined in Listing 1. Of particular note here is that the NetIpHdr structure defines a header of a fixed size. In truth, the length of this header may be extended in 32-bit increments to support some less commonly used IP features. However, µC/Net never uses any of the special features that these extensions allow. So variable length IP headers are not generated by the stack and it will reject any incoming IP packets that have a header of length other than 20 bytes. The first field in any IP header, ver_hlen, gives the protocol version (4 for IPv4) in the first nibble and header length (five 32-bit words for the fixed size header) in the second nibble. All of the IP packets sent by µC/Net will, therefore, have this field set to the constant value 0x45. Similarly, it is a requirement that all packets received by the stack have that same value in ver_hlen.

/*
* IP Header
*/
typedef struct
{
   uint8_t   ver_hlen;   /* Header version and length (dwords). */
   uint8_t   service;    /* Service type. */
   uint16_t  length;     /* Length of datagram (bytes). */
   uint16_t  ident;      /* Unique packet identification. */
   uint16_t  fragment;   /* Flags; Fragment offset. */
   uint8_t   timetolive; /* Packet time to live (in network). */
   uint8_t   protocol;   /* Upper level protocol (UDP, TCP). */
   uint16_t  checksum;   /* IP header checksum. */
   uint32_t  src_addr;   /* Source IP address. */
   uint32_t  dest_addr;  /* Destination IP address. */

} NetIpHdr;

#define IP_VER_HLEN     0x45
#define IP_HEADER_LEN   sizeof(NetIpHdr)
#define IP_DATA_LEN     (MTU - IP_HEADER_LEN)

/*
* IP Packet      
*/
typedef struct
{
   NetIpHdr ipHdr;
   uint8_t  data[IP_DATA_LEN];

} NetIpPkt;

Listing 1. Data structures and constants for the IP layer

In addition to this restriction, several other features of IP are either unsupported by µC/Net or the support is significantly scaled back. The only one of these that is likely to affect you is their lack of support for fragmentation and reassembly.

Fragmentation and Reassembly

Ethernet frames have a defined maximum size payload of 1,500 bytes each. This is termed the maximum transmission unit, or MTU, of the physical network. Included within that payload are the contents of the IP header, the UDP (or TCP) header, and the actual data. If all of the packets of data you want to send and receive fit within that maximum size-as is the case in the majority of embedded applications requiring only limited connectivity-µC/Net will do the job handily. If, however, you want to send even a single 1,501-byte frame, you'll run into trouble.

A more complete IP layer implementation would include a feature called fragmentation and reassembly. The idea that underlies this feature is simple: payloads too large to be sent over the physical network all at once should be split up across multiple frames (fragmented) and reassembled at the receiving end. The IP header field fragment can be used by the stacks on the two ends to coordinate this process.

Unfortunately, the implementation of fragmentation and reassembly is extremely cumbersome. Adding just this one feature to the IP layer would approximately double the amount of code (and code complexity) of the entire µC/Net stack! For that reason, I've made the decision not to support fragmentation and reassembly at all at this time. If the UDP layer attempts to send a packet that is too large, the IP layer will simply truncate the data to IP_DATA_LEN bytes and send that instead. If a remote computer fragments and then sends a large packet to a computer running µC/Net, the incoming data will appear to the client or server application as though it were actually multiple unrelated packets of information, rather than one really large one. (A poorly coded application could get confused by this, so beware.)

Sending and receiving

By now you must be wondering what IP features I have implemented. Ultimately, the IP layer is just responsible for sending and receiving IP packets. Listing 2 shows the code for NetIpSnd(), which does the sending. The IP packets resulting from this function call should make sense to any IPv4 system to which they are sent. That's the beauty of a more limited implementation of the protocol: it's small, easily coded and understood, and yet-because it implements a subset of the full protocol-can communicate with any system on the network.

int
NetIpSnd(NetIpPkt * pIpPkt, uint16_t len)
{
   uint16_t ident;


   /*
    * Assign a unique ID to the outgoing packet.
    */
   // Enter critical section here.
   ident = gNextPacketID++;
   // Leave critical section here.

   /*
    * Fill in the remaining IP header fields.
    * (Some fields were pre-filled by the UDP layer.)
    */
   pIpPkt->ipHdr.ver_hlen = IP_VER_HLEN;
   pIpPkt->ipHdr.service = 0x00;
   pIpPkt->ipHdr.length = htons(len);
   pIpPkt->ipHdr.ident = htons(ident);
   pIpPkt->ipHdr.fragment = 0x00;
   pIpPkt->ipHdr.timetolive = 0x10;

   /*
    * Compute and fill in the IP header checksum.
    */
   pIpPkt->ipHdr.checksum = NetIpChecksum((INT16U *) pIpPkt, IP_HEADER_LEN);

   /*
    * Forward the IP packet to the network interface driver.
    */
   return (NetPhySnd(htons(PROTO_IP), (uint8_t *) pIpPkt, len));

}   /* NetIpSnd() */

Listing 2. A function to send an IP packet

Some fields of the IP header are filled in by the UDP layer above. Specifically, the UDP layer fills in the IP header's source and destination IP addresses, which it combines with the UDP header and payload to compute a UDP checksum. The UDP sending function then passes the entire packet to NetIpSnd(). After filling in the remaining fields of the IP header, NetIpSnd() computes its own checksum(of the IP header contents only) and passes the completed IP packet on to the network driver, to be sent out over the physical network.

The NetIpRcv() function, shown in Listing 3, handles incoming IP packets. This function is called from within the context of a task that's part of the network driver. The driver task becomes active whenever a new packet arrives over the network. It then sees what type of packet it is (ARP, IP, and so on) and routes it to the appropriate function. NetIpRcv() processes all incoming IP packets.

int
NetIpRcv(NetIpPkt * pIpPkt)
{
   uint16_t checksum;


   /*
    * Check IP header version and length.
    */
   if (pIpPkt->ipHdr.ver_hlen != IP_VER_HLEN)
   {
       /*
        * Unsupported header version or length.
        */
       return (NET_ERROR);
   }

   /*
    * Move the IP header checksum out of the header.
    */
   checksum = pIpPkt->ipHdr.checksum;
   pIpPkt->ipHdr.checksum = 0;

   /*
    * Compute checksum and compare with received value.
    */
   if (checksum != NetIpChecksum((uint16_t *) pIpPkt, IP_HEADER_LEN))
   {
       return (NET_ERROR); //Bad checksum
   }

   /*
    * Route the packet to the appropriate Layer 3 protocol.
    */
   switch (pIpPkt->ipHdr.protocol)
   {
     case PROTO_UDP:
       return (NetUdpRcv((NetUdpPkt *) pIpPkt, ntohs(pIpPkt->ipHdr.length)
   - IP_HEADER_LEN));

#if defined (NET_TCP_EN)
     case PROTO_TCP:
       return (NetTcpRcv((NetTcpPkt *) pIpPkt, ntohs(pIpPkt->ipHdr.length)
   - IP_HEADER_LEN));
#endif

     default:
       return (NET_ERROR); // Unsupported protocol
   }

}   /* NetIpRcv() */

Listing 3. A function to receive an IP packet

After checking the IP version, header length, and checksum, each incoming IP packet is routed to the layer above. If it is a UDP packet, NetUdpRcv() is called. If it is a TCP packet and TCP support is included, NetTcpRcv() is called instead.

Fast Accurate Memory Test Suite

5:16 AM with 0 comments »

If ever there was a piece of embedded software ripe for reuse it's the memory test. This article shows how to test for the most common memory problems with a set of three efficient, portable, public-domain memory test functions.

One piece of software that nearly every embedded developer must write at some point in his career is a memory test. Often, once the prototype hardware is ready, the board's designer would like some reassurance that she has wired the address and data lines correctly, and that the various memory chips are working properly. Even if that's not the case, it is desirable to test any onboard RAM at least as often as the system is reset. It is up to the embedded software developer, then, to figure out what can go wrong and design a suite of tests that will uncover potential problems.

At first glance, writing a memory test may seem like a fairly simple endeavor. However, as you look at the problem more closely you will realize that it can be difficult to detect subtle memory problems with a simple test. In fact, as a result of programmer naïveté, many embedded systems include memory tests that would detect only the most catastrophic memory failures. Perhaps unbelievably, some of these may not even notice that the memory chips have been removed from the board!

The purpose of a memory test is to confirm that each storage location in a memory device is working. In other words, if you store the value 50 at a particular address, you expect to find that value stored there until another value is written to that same address. The basic idea behind any memory test, then, is to write some set of data to each address in the memory device and verify the data by reading it back. If all the values read back are the same as those that were written, then the memory device is said to pass the test. As you will see, it is only through careful selection of the set of data values that you can be sure that a passing result is meaningful.

Of course, a memory test like the one just described is necessarily destructive. In the process of testing the memory, you must overwrite its prior contents. Since it is usually impractical to overwrite the contents of nonvolatile memories, the tests described in this article are generally used only for RAM testing. However, if the contents of a non-volatile memory device, like flash, are unimportant--as they are during the product development stage--these same algorithms can be used to test those devices as well.

Common memory problems

Before implementing any of the possible test algorithms, you should be familiar with the types of memory problems that are likely to occur. One common misconception among software engineers is that most memory problems occur within the chips themselves. Though a major issue at one time (a few decades ago), problems of this type are increasingly rare. These days, the manufacturers of memory devices perform a variety of post-production tests on each batch of chips. If there is a problem with a particular batch, it is extremely unlikely that one of the bad chips will make its way into your system.

The one type of memory chip problem you could encounter is a catastrophic failure. This is usually caused by some sort of physical or electrical damage to the chip after manufacture. Catastrophic failures are uncommon and usually affect large portions of the chip. Since a large area is affected, it is reasonable to assume that catastrophic failure will be detected by any decent test algorithm.

In my experience, the most common source of actual memory problems is the circuit board. Typical circuit board problems are problems with the wiring between the processor and memory device, missing memory chips, and improperly inserted memory chips. These are the problems that a good memory test algorithm should be able to detect. Such a test should also be able to detect catastrophic memory failures without specifically looking for them. So, let's discuss the circuit board problems in more detail.

Electrical wiring problems

An electrical wiring problem could be caused by an error in design or production of the board or as the result of damage received after manufacture. Each of the wires that connects the memory device to the processor is one of three types: an address line, a data line, or a control line. The address and data lines are used to select the memory location and to transfer the data, respectively. The control lines tell the memory device whether the processor wants to read or write the location and precisely when the data will be transferred. Unfortunately, one or more of these wires could be improperly routed or damaged in such a way that it is either shorted (for example, connected to another wire on the board) or open (not connected to anything). These problems are often caused by a bit of solder splash or a broken trace, respectively. Both cases are illustrated in Figure 1.

shorts opens wiring problems

Figure 1. Possible wiring problems

Problems with the electrical connections to the processor will cause the memory device to behave incorrectly. Data may be stored incorrectly, stored at the wrong address, or not stored at all. Each of these symptoms can be explained by wiring problems on the data, address, and control lines, respectively.

If the problem is with a data line, several data bits may appear to be "stuck together" (for example, two or more bits always contain the same value, regardless of the data transmitted). Similarly, a data bit may be either "stuck high" (always 1) or "stuck low" (always 0). These problems can be detected by writing a sequence of data values designed to test that each data pin can be set to 0 and 1, independently of all the others.

If an address line has a wiring problem, the contents of two memory locations may appear to overlap. In other words, data written to one address will actually overwrite the contents of another address instead. This happens because an address bit that is shorted or open will cause the memory device to see an address different than the one selected by the processor.

Another possibility is that one of the control lines is shorted or open. Although it is theoretically possible to develop specific tests for control line problems, it is not possible to describe a general test for them. The operation of many control signals is specific to either the processor or memory architecture. Fortunately, if there is a problem with a control line, the memory will probably not work at all, and this will be detected by other memory tests. If you suspect a problem with a control line, it is best to seek the advice of the board's designer before constructing a specific test.

Missing memory chips

A missing memory chip is clearly a problem that should be detected. Unfortunately, due to the capacitive nature of unconnected electrical wires, some memory tests will not detect this problem. For example, suppose you decided to use the following test algorithm: write the value 1 to the first location in memory, verify the value by reading it back, write 2 to the second location, verify the value, write 3 to the third location, verify, and so on. Since each read occurs immediately after the corresponding write, it is possible that the data read back represents nothing more than the voltage remaining on the data bus from the previous write. If the data is read back too quickly, it will appear that the data has been correctly stored in memory-even though there is no memory chip at the other end of the bus!

To detect a missing memory chip the test must be altered. Instead of performing the verification read immediately after the corresponding write, it is desirable to perform several consecutive writes followed by the same number of consecutive reads. For example, write the value 1 to the first location, 2 to the second location, and 3 to the third location, then verify the data at the first location, the second location, and so on. If the data values are unique (as they are in the test just described), the missing chip will be detected: the first value read back will correspond to the last value written (3), rather than the first (1).

Improperly inserted chips

If a memory chip is present but improperly inserted in its socket, the system will usually behave as though there is a wiring problem or a missing chip. In other words, some number of the pins on the memory chip will either not be connected to the socket at all or will be connected at the wrong place. These pins will be part of the data bus, address bus, or control wiring. So as long as you test for wiring problems and missing chips, any improperly inserted chips will be detected automatically.

Developing a test strategy

Before going on, let's quickly review the types of memory problems we must be able to detect. Memory chips only rarely have internal errors, but, if they do, they are probably catastrophic in nature and will be detected by any test. A more common source of problems is the circuit board, where a wiring problem may occur or a memory chip may be missing or improperly inserted. Other memory problems can occur, but the ones described here are the most common.

By carefully selecting your test data and the order in which the addresses are tested, it is possible to detect all of the memory problems described above. It is usually best to break your memory test into small, single-minded pieces. This helps to improve the efficiency of the overall test and the readability of the code. More specific tests can also provide more detailed information about the source of the problem, if one is detected.

I have found it is best to have three individual memory tests: a data bus test, an address bus test, and a device test. The first two tests detect electrical wiring problems and improperly inserted chips, while the third is intended to detect missing chips and catastrophic failures. As an unintended consequence, the device test will also uncover problems with the control bus wiring, though it will not provide useful information about the source of such a problem.

The order in which you execute these three tests is important. The proper order is: data bus test first, followed by the address bus test, and then the device test. That's because the address bus test assumes a working data bus, and the device test results are meaningless unless both the address and data buses are known to be good. If any of the tests fail, you should work with the board's designer to locate the source of the problem. By looking at the data value or address at which the test failed, he or she should be able to quickly isolate the problem on the circuit board.

Data bus test

The first thing we want to test is the data bus wiring. We need to confirm that any value placed on the data bus by the processor is correctly received by the memory device at the other end. The most obvious way to test that is to write all possible data values and verify that the memory device stores each one successfully. However, that is not the most efficient test available. A faster method is to test the bus one bit at a time. The data bus passes the test if each data bit can be set to 0 and 1, independently of the other data bits.

00000001
00000010
00000100
00001000
00010000
00100000
01000000
10000000

Table 1. Consecutive data values for the walking 1's test

A good way to test each bit independently is to perform the so-called "walking 1's test." Table 1 shows the data patterns used in an 8-bit version of this test. The name of this test comes from the fact that a single data bit is set to 1 and "walked" through the entire data word. The number of data values to test is the same as the width of the data bus. This reduces the number of test patterns from 2n to n, where n is the width of the data bus.

Since we are testing only the data bus at this point, all of the data values can be written to the same address. Any address within the memory device will do. However, if the data bus splits as it makes its way to more than one memory chip, you will need to perform the data bus test at multiple addresses, one within each chip.

To perform the walking 1's test, simply write the first data value in the table, verify it by reading it back, write the second value, verify, and so on. When you reach the end of the table, the test is complete. It is okay to do the read immediately after the corresponding write this time because we are not yet looking for missing chips. In fact, this test provides meaningful results even if the memory chips are not installed.

typedef unsigned char datum;    /* Set the data bus width to 8 bits.  */


/**********************************************************************
*
* Function:    memTestDataBus()
*
* Description: Test the data bus wiring in a memory region by
*              performing a walking 1's test at a fixed address
*              within that region.  The address (and hence the
*              memory region) is selected by the caller.
*
* Notes:      
*
* Returns:     0 if the test succeeds. 
*              A non-zero result is the first pattern that failed.
*
**********************************************************************/
datum
memTestDataBus(volatile datum * address)
{
   datum pattern;


   /*
    * Perform a walking 1's test at the given address.
    */
   for (pattern = 1; pattern != 0; pattern <<= 1)
   {
       /*
        * Write the test pattern.
        */
       *address = pattern;

       /*
        * Read it back (immediately is okay for this test).
        */
       if (*address != pattern)
       {
           return (pattern);
       }
   }

   return (0);

}   /* memTestDataBus() */

Listing 1. Data bus test

The function memTestDataBus(), in Listing 1, shows how to implement the walking 1's test in C. It assumes that the caller will select the test address, and tests the entire set of data values at that address. If the data bus is working properly, the function will return 0. Otherwise it will return the data value for which the test failed. The bit that is set in the returned value corresponds to the first faulty data line, if any.

Address bus test

After confirming that the data bus works properly, you should next test the address bus. Remember that address bus problems lead to overlapping memory locations. Many possible addresses could overlap. However, it is not necessary to check every possible combination. You should instead follow the example of the data bus test above and try to isolate each address bit during testing. You just need to confirm that each of the address pins can be set to 0 and 1 without affecting any of the others.

The smallest set of addresses that will cover all possible combinations is the set of "power-of-two" addresses. These addresses are analogous to the set of data values used in the walking 1's test. The corresponding memory locations are 0001h, 0002h, 0004h, 0008h, 0010h, 0020h, and so on. In addition, address 0000h must also be tested. The possibility of overlapping locations makes the address bus test harder to implement. After writing to one of the addresses, you must check that none of the others has been overwritten.

It is important to note that not all of the address lines can be tested in this way. Part of the address-the leftmost bits-selects the memory chip itself. Another part-the rightmost bits-may not be significant if the data bus width is greater than eight bits. These extra bits will remain constant throughout the test and reduce the number of test addresses. For example, if the processor has 32 address bits, it can address up to 4GB of memory. If you want to test a 128K block of memory, the 15 most-significant address bits will remain constant. In that case, only the 17 rightmost bits of the address bus can actually be tested. (128K is 1/32,768th of the total 4GB address space.)

To confirm that no two memory locations overlap, you should first write some initial data value at each power-of-two offset within the device. Then write a new value-an inverted copy of the initial value is a good choice-to the first test offset, and verify that the initial data value is still stored at every other power-of-two offset. If you find a location, other than the one just written, that contains the new data value, you have found a problem with the current address bit. If no overlapping is found, repeat the procedure for each of the remaining offsets.

/**********************************************************************
*
* Function:    memTestAddressBus()
*
* Description: Test the address bus wiring in a memory region by
*              performing a walking 1's test on the relevant bits
*              of the address and checking for aliasing. This test
*              will find single-bit address failures such as stuck
*              -high, stuck-low, and shorted pins.  The base address
*              and size of the region are selected by the caller.
*
* Notes:       For best results, the selected base address should
*              have enough LSB 0's to guarantee single address bit
*              changes.  For example, to test a 64-Kbyte region,
*              select a base address on a 64-Kbyte boundary.  Also,
*              select the region size as a power-of-two--if at all
*              possible.
*
* Returns:     NULL if the test succeeds. 
*              A non-zero result is the first address at which an
*              aliasing problem was uncovered.  By examining the
*              contents of memory, it may be possible to gather
*              additional information about the problem.
*
**********************************************************************/
datum *
memTestAddressBus(volatile datum * baseAddress, unsigned long nBytes)
{
   unsigned long addressMask = (nBytes/sizeof(datum) - 1);
   unsigned long offset;
   unsigned long testOffset;

   datum pattern     = (datum) 0xAAAAAAAA;
   datum antipattern = (datum) 0x55555555;


   /*
    * Write the default pattern at each of the power-of-two offsets.
    */
   for (offset = 1; (offset & addressMask) != 0; offset <<= 1)
   {
       baseAddress[offset] = pattern;
   }

   /*
    * Check for address bits stuck high.
    */
   testOffset = 0;
   baseAddress[testOffset] = antipattern;

   for (offset = 1; (offset & addressMask) != 0; offset <<= 1)
   {
       if (baseAddress[offset] != pattern)
       {
           return ((datum *) &baseAddress[offset]);
       }
   }

   baseAddress[testOffset] = pattern;

   /*
    * Check for address bits stuck low or shorted.
    */
   for (testOffset = 1; (testOffset & addressMask) != 0; testOffset <<= 1)
   {
       baseAddress[testOffset] = antipattern;

  if (baseAddress[0] != pattern)
  {
   return ((datum *) &baseAddress[testOffset]);
  }

       for (offset = 1; (offset & addressMask) != 0; offset <<= 1)
       {
           if ((baseAddress[offset] != pattern) && (offset != testOffset))
           {
               return ((datum *) &baseAddress[testOffset]);
           }
       }

       baseAddress[testOffset] = pattern;
   }

   return (NULL);

}   /* memTestAddressBus() */

Listing 2. Address bus test

The function memTestAddressBus(), in Listing 2, shows how this can be done in practice. The function accepts two parameters. The first parameter is the base address of the memory block to be tested and the second is its size, in bytes. The size is used to determine which address bits should be tested. For best results, the base address should contain a 0 in each of those bits. If the address bus test fails, the address at which the first error was detected will be returned. Otherwise, this function returns NULL to indicate success.

Device test

Once you know that the address and data bus wiring are working, it is necessary to test the integrity of the memory device itself. The thing to test is that every bit in the device is capable of holding both 0 and 1. This is a fairly straightforward test to implement, but takes significantly longer to execute than the previous two.

For a complete device test, you must visit (write and verify) every memory location twice. You are free to choose any data value for the first pass, so long as you invert that value during the second. And since there is a possibility of missing memory chips, it is best to select a set of data that changes with (but is not equivalent to) the address. A simple example is an "increment test."

Offset	Value	Inverted Value
00h	00000001	11111110
01h	00000010	11111101
02h	00000011	11111100
03h	00000100	11111011
...	...	...
FEh	11111111	00000000
FFh	00000000	11111111

Table 2. Data values for an increment test

The offsets and corresponding data values for the increment test are shown in the first two columns of Table 2. The third column shows the inverted data values used during the second pass of this test. The latter represents a decrement test. There are many other possible choices of data, but the incrementing data pattern is adequate and easy to compute.

/**********************************************************************
*
* Function:    memTestDevice()
*
* Description: Test the integrity of a physical memory device by
*              performing an increment/decrement test over the
*              entire region.  In the process every storage bit
*              in the device is tested as a zero and a one.  The
*              base address and the size of the region are
*              selected by the caller.
*
* Notes:      
*
* Returns:     NULL if the test succeeds.
*
*              A non-zero result is the first address at which an
*              incorrect value was read back.  By examining the
*              contents of memory, it may be possible to gather
*              additional information about the problem.
*
**********************************************************************/
datum *
memTestDevice(volatile datum * baseAddress, unsigned long nBytes) 
{
   unsigned long offset;
   unsigned long nWords = nBytes / sizeof(datum);

   datum pattern;
   datum antipattern;


   /*
    * Fill memory with a known pattern.
    */
   for (pattern = 1, offset = 0; offset < nWords; pattern++, offset++)
   {
       baseAddress[offset] = pattern;
   }

   /*
    * Check each location and invert it for the second pass.
    */
   for (pattern = 1, offset = 0; offset < nWords; pattern++, offset++)
   {
       if (baseAddress[offset] != pattern)
       {
           return ((datum *) &baseAddress[offset]);
       }

       antipattern = ~pattern;
       baseAddress[offset] = antipattern;
   }

   /*
    * Check each location for the inverted pattern and zero it.
    */
   for (pattern = 1, offset = 0; offset < nWords; pattern++, offset++)
   {
       antipattern = ~pattern;
       if (baseAddress[offset] != antipattern)
       {
           return ((datum *) &baseAddress[offset]);
       }
   }

   return (NULL);

}   /* memTestDevice() */

Listing 3. Device test

The function memTestDevice(), in Listing 3, implements just such a two-pass increment/decrement test. It accepts two parameters from the caller. The first parameter is the starting address and the second is the number of bytes to be tested. These parameters give the user maximum control over which areas of memory will be overwritten. The function will return NULL on success. Otherwise, the first address containing an incorrect data value is returned.

Putting it all together

To make our discussion more concrete, let's consider a practical example. Suppose that we wanted to test a 64K chunk of SRAM at address 00000000h. To do this, we call each of the three test routines in the proper order, as shown in Listing 4. In each case, the first parameter is the base address of the memory block. If the width of the data bus is greater than eight bits, a couple of modifications are required.

int
memTest(void)
{
#define BASE_ADDRESS  (volatile datum *) 0x00000000
#define NUM_BYTES     (64 * 1024)

   if ((memTestDataBus(BASE_ADDRESS) != 0) ||
       (memTestAddressBus(BASE_ADDRESS, NUM_BYTES) != NULL) ||
       (memTestDevice(BASE_ADDRESS, NUM_BYTES) != NULL))
   {
       return (-1);
   }
   else
   {
       return (0);
   }
  
}   /* memTest() */

Listing 4. Memory test engine

If any of the individual memory test routines returns a nonzero (or non-NULL) value, you might turn on a red LED to visually indicate the error. Otherwise, after all three tests have completed successfully, you might turn on a green LED. In the event of an error, the test routine that failed will return some information about the problem encountered. This information can be useful when communicating with the hardware designer or technician about the nature of the problem. However, it is visible only if we are running the test program in a debugger or emulator.

In most cases, you would simply download the entire suite and let it run. Then, if and only if a memory problem is found, would you need to use a debugger to step through the program and examine the individual function return codes and contents of the memory device to see which test failed and why.

Unfortunately, it is not always possible to write memory tests in a high-level language. For example, the C language requires the use of a stack. But a stack itself requires working memory. This might be reasonable in a system with more than one memory device. For example, you might create a stack in an area of RAM that is already known to be working, while testing another memory device. In a situation such as this, a small SRAM could be tested from assembly and the stack could be created there afterward. Then a larger block of DRAM could be tested using a better test suite, like the one shown here. If you cannot assume enough working RAM for the stack and data needs of the test program, then you will need to rewrite these memory test routines entirely in assembly language. Another option is to run the memory test program from an in-circuit emulator. In this case, you could choose to place the stack in an area of the emulator's own internal memory. By moving the emulator's internal memory around in the target memory map, you could systematically test each memory device on the target.

The need for memory testing is most apparent during product development, when the reliability of the hardware and its design are still unproven. However, memory is one of the most critical resources in any embedded system, so it may also be desirable to include a memory test in the final release of your software. In that case, the memory test, and other hardware confidence tests, should be run each time the system is powered-on or reset. Together, this initial test suite forms a set of hardware diagnostics. If one or more of the diagnostics fail, a repair technician can be called in to diagnose the problem and repair or replace the faulty hardware.

Jump Tables via Function Pointer Arrays in C/C++

5:15 AM with 0 comments »

Jump tables, also called branch tables, are an efficient means of handling similar events in software. Here's a look at the use of arrays of function pointers in C/C++ as jump tables.

Examination of assembly language code that has been crafted by an expert will usually reveal extensive use of function "branch tables." Branch tables (a.k.a., jump tables) are used because they offer a unique blend of compactness and execution speed, particularly on microprocessors that support indexed addressing. When one examines typical C/C++ code, however, the branch table (i.e., an array of funtion pointers) is a much rarer beast. The purpose of this article is to examine why branch tables are not used by C/C++ programmers and to make the case for their extensive use. Real world examples of their use are included.

Function pointers

In talking to C/C++ programmers about this topic, three reasons are usually cited for not using function pointers. They are:

They are dangerous
A good optimizing compiler will generate a jump table from a switch statement, so let the compiler do the work
They are too difficult to code and maintain

Are function pointers dangerous?

This school of thought comes about, because code that indexes into a table and then calls a function based on the index has the capability to end up just about anywhere. For instance, consider the following code fragment:

void (*pf[])(void) = {fna, fnb, fnc, …, fnz};

void test(const INT jump_index)
{
   /* Call the function specified by jump_index */
   pf[jump_index]();
}

The above code declares pf[] to be an array of pointers to functions, each of which takes no arguments and returns void. The test() function simply calls the specified function via the array. As it stands, this code is dangerous for the following reasons.

pf[] is accessible by anyone
In test(), there is no bounds checking, such that an erroneous jump_index would spell disaster

A much better way to code this that avoids these problems is as follows

void test(uint8_t j const ump_index)
{
   static void (*pf[])(void) = {fna, fnb, fnc, …, fnz};

   if (jump_index < sizeof(pf) / sizeof(*pf))
   {
       /* Call the function specified by jump_index */
       pf[jump_index]();
   }
}

The changes are subtle, yet important.

By declaring the array static within the function, no one else can access the jump table
Forcing jump_index to be an unsigned quantity means that we only need to perform a one sided test for our bounds checking
Setting jump_index to the smallest data type possible that will meet the requirements provides a little more protection (most jump tables are smaller than 256 entries)
An explicit test is performed prior to making the call, thus ensuring that only valid function calls are made. (For performance critical applications, the if() statement could be replaced by an assert())

This approach to the use of a jump table is just as secure as an explicit switch statement, thus the idea that jump tables are dangerous may be rejected.

Leave it to the optimizer?

It is well known that many C compilers will attempt to convert a switch statement into a jump table. Thus, rather than use a function pointer array, many programmers prefer to use a switch statement of the form:

void test(uint8_t j const ump_index)
{
   switch (jump_index)
   {
     case 0:
       fna();
       break;

     case 1:
       fnb();
       break;

       …

     case 26:
       fnz();
       break;

     default:
       break;
   }
}

Indeed, Jack Crenshaw advocated this approach in a September 1998 column in Embedded Systems Programming. Well, I have never found myself disagreeing with Dr. Crenshaw before, but there is always a first time for everything! A quick survey of the documentation for a number of compilers revealed some interesting variations. They all claimed to potentially perform conversion of a switch statement into a jump table. However, the criteria for doing so varied considerably. One vendor simply said that they would attempt to perform this optimization. A second claimed to use a heuristic algorithm to decide which was "better," while a third permitted pragma's to let the user specify what they wanted. This sort of variation does not give one a warm fuzzy feeling!

In the case where one has, say, 26 contiguous indices, each associated with a single function call (such as the example above), the compiler will almost certainly generate a jump table. However, what about the case where you have 26 non-contiguous indices, that vary in value from 0 to 1000? A jump table would have 974 null entries or 1948 "wasted" bytes on the average microcontroller. Most compilers would deem this too high a penalty to pay, and would eschew the jump table for an if-else sequence. However, if you have EPROM to burn, it actually costs nothing to implement this as a jump table, but buys you consistent (and fast) execution time. By coding this as a jump table, you ensure that the compiler does what you want.

There is a further problem with large switch statements. Once a switch statement gets much beyond a screen length, it becomes harder to see the big picture, and thus the code is more difficult to maintain. A function pointer array declaration, adequately commented to explain the declaration, is much more compact, allowing one to see the overall picture. Furthermore, the function pointer array is potentially more robust. Who has not written a large switch statement and forgotten to add a break statement on one of the cases?

Complexities

Complexity associated with jump table declaration and use is the real reason they are not used more often. In embedded systems, where pointers normally have mandatory memory space qualifiers, the declarations can quickly become horrific. For instance, the example above would be highly undesirable on most embedded systems, since the pf[] array would probably end up being stored in RAM, instead of ROM. The way to ensure the array is stored in ROM varies somewhat between compiler vendors. However, a first step that is portable to all systems is to add const qualifiers to the declaration. Thus, our array declaration now becomes:

static void (* const pf[])(void) = {fna, fnb, fnc, …, fnz};

Like many users, I find these declarations cryptic and very daunting. However, over the years, I have built up a library of declaration templates that I simply refer to as necessary. A compilation of useful templates appears below.

A handy trick is to learn to read complex declarations like this backwards--i.e., from right to left. Doing this here's how I'd read the above: pf is an array of constant pointers to functions that return void. The static keyword is only needed if this is declared privately within the function that uses it--and thus keeping it off the stack.

Arrays of function pointers

Most books about C programming cover function pointers in less than a page (while devoting entire chapters to simple looping constructs). The descriptions typically say something to the effect that you can take the address of a function, and thus one can define a pointer to a function, and the syntax looks like such and such. At which point, most readers are left staring at a complex declaration, and wondering what exactly function pointers are good for. Small wonder that function pointers do not feature heavily in their work.

Well then, where are jump tables useful? In general, arrays of function pointers are useful whenever there is the potential for a range of inputs to a program that subsequently alters program flow. Some typical examples from the embedded world are given below.

Keypads

The most often cited example for uses of function pointers is with keypads. The general idea is obvious. A keypad is normally arranged to produce a unique keycode. Based on the value of the key pressed, some action is taken. This can be handled via a switch statement. However, an array of function pointers is far more elegant. This is particularly true when the application has multiple user screens, with some key definitions changing from screen to screen (i.e., the system uses soft keys). In this case, a two dimensional array of function pointers is often used.

#define N_SCREENS  16
#define N_KEYS     6

/* Prototypes for functions that appear in the jump table */
INT fnUp(void);
INT fnDown(void);
…
INT fnMenu(void);
INT fnNull(void);

INT keypress(uint8_t key, uint8_t screen)
{
   static INT (* const pf[N_SCREENS][N_KEYS])(void) =
   {
       {fnUp, fnDown, fnNull, …, fnMenu},
 {fnMenu, fnNull, …, fnHome},
 …
 {fnF0, fnF1, …, fnF5}
   };

   assert (key < N_KEYS);
   assert( screen < N_SCREENS);

   /* Call the function and return result */
   return (*pf[screen][key])(); 
}

/* Dummy function - used as an array filler */
INT fnNull(void)
{
   return 0;
}

There are several points to note about the above example:

All functions to be named in a jump table should be prototyped. Prototyping is the best line of defense against including a function that expects the wrong parameters, or returns the wrong type.
As for earlier examples, the function table is declared within the function that makes use of it (and, thus, static)
The array is made const signifying that we wish it to remain unchanged
The indices into the array are unsigned, such that only single sided bounds checking need be done
In this case, I have chosen to use the assert() macro to provide the bounds checking. This is a good compromise between debugging ease and runtime efficiency.
A dummy function fnNull() has been declared. This is used where a keypress is undefined. Rather than explicitly testing to see whether a key is valid, the dummy function is invoked. This is usually the most efficient method of handling an function array that is only partially populated.
The functions that are called need not be unique. For example, a function such as fnMenu may appear many times in the same jump table.

Communication protocols

Although the keypad example is easy to appreciate, my experience in embedded systems is that communication links occur far more often than keypads. Communication protocols are a challenge ripe for a branch table solution. This is best illustrated by an example.

Last year, I worked on the design for an interface box to a very large industrial power supply. This interface box had to accept commands and return parameter values over a RS-232 link. The communications used a set of simple ASCII mnemonics to specify the action to be taken. The mnemonics consisted of a channel number (0,1, or 2), followed by a two character parameter. The code to handle a read request coming in over the serial link is shown below. The function process_read() is called with a pointer to a string fragment that is expected to consist of the three characters (null terminated) containing the required command.

const CHAR *fna(void); // Example function prototype

static void process_read(const CHAR *buf)
{
   CHAR *cmdptr;
   UCHAR offset;
   const CHAR *replyptr;
 
   static const CHAR read_str[] =
 "0SV 0SN 0MO 0WF 0MT 0MP 0SW 1SP 1VO 1CC 1CA 1CB
  1ST 1MF 1CL 1SZ 1SS 1AZ 1AS 1BZ 1BS 1VZ 1VS 1MZ
  1MS 2SP 2VO 2CC 2CA 2CB 2ST 2MF 2CL 2SZ 2SS
  2AZ 2AS 2BZ 2BS 2VZ 2VS 2MZ 2MS "; 

   static const CHAR *
       (* const readfns[sizeof(read_str)/4])(void) =
       {
     fna,fnb,fnc, …
       };
 
   cmdptr = strstr(read_str, buf);
 
   if (cmdptr != NULL)
   { 
 /*
        * cmdptr points to the valid command, so compute offset,
  * in order to get entry into function jump table
        */
 offset = (cmdptr - read_str) / 4; 

 /* Call function and get pointer to reply*/
 replyptr = (*readfns[offset])();

 /* rest of the code goes here */
   }
}

The code above is quite straightforward. A constant string, read_str, is defined. The read_str contains the list of all legal mnemonic combinations. Note the use of added spaces to aid clarity. Next, we have the array of function pointers, one pointer for each valid command. We determine if we have a valid command sequence by making use of the standard library function strstr(). If a match is found, it returns a pointer to the matching substring, else it returns NULL. We check for a valid pointer, compute the offset into the string, and then use the offset to call the appropriate handler function in the jump table. Thus, in four lines of code, we have determined if the command is valid and called the appropriate function. Although the declaration of readfns[] is complex, the simplicity of the runtime code is tough to beat.

Timed task list

A third area where function pointers are useful is in timed task lists. In this case, the input to the system is the passage of time. Many projects cannot justify the use of an RTOS. Instead, all that is required is that a number of tasks run at predetermined intervals. This is very simply handled as shown below.

typedef struct
{
  UCHAR interval;      /* How often to call the task */
  void (*proc)(void); /* pointer to function returning void */

} TIMED_TASK;

static const TIMED_TASK timed_task[] =
{
   { INTERVAL_16_MSEC,  fnA },
   { INTERVAL_50_MSEC,  fnB },
   { INTERVAL_500_MSEC, fnC },
   …
   { 0, NULL }
};

extern volatile UCHAR tick;

void main(void)
{
   const TIMED_TASK *ptr;
   UCHAR time;
 
   /* Initialization code goes here. Then enter the main loop */

   while (1)
   {
 if (tick)
       {
           /* Check timed task list */
           tick--;
           time = computeElapsedTime(tick);
           for (ptr = timed_task; ptr->interval !=0; ptr++)
           {
               if (!(time % ptr->interval))
               {
                   /* Time to call the function */
      (ptr->proc)();
               }
           }
 }
   }
}

In this case, we define our own data type (TIMED_TASK) that consists simply of an interval and a pointer to a function. We then define an array of TIMED_TASK, and initialize it with the list of functions that are to be called and their calling interval. In main(), we have the start up code which must enable a periodic timer interrupt that increments the volatile variable tick at a fixed interval. We then enter the infinite loop.

The infinite loop checks for a non-zero tick value, decrements the tick variable and computes the elapsed time since the program started running. The code then simply steps through each of the tasks, to see whether it is time for that one to be executed and, if so, calls it via the function pointer.

If your application only consists of two or three tasks, then this approach is probably overkill. However, if your project has a large number of timed tasks, or it is likely that you will have to add tasks in the future, then this approach is rather palatable. Note that adding tasks and/or changing intervals simply requires editing of the timed_task[] array. No code, per se, has to be changed.

Interrupt vector tables

The fourth application of function jump tables is the array of interrupt vectors. On most processors, the interrupt vectors are in contiguous locations, with each vector representing a pointer to an interrupt service routine function. Depending upon the compiler, the work may be done for you implicitly, or you may be forced to generate the function table. In the latter case, implementing the vectors via a switch statement will not work!

Here is the vector table from the industrial power supply project mentioned above. This project was implemented using a Whitesmiths' compiler and a 68HC11 microncontroller.

IMPORT VOID _stext();  /* 68HC11-specific startup routine */

static VOID (* const _vectab[])() =
{
   SCI_Interrupt, /* SCI              */
   badSPI_Interrupt, /* SPI              */
   badPAI_Interrupt, /* Pulse acc input  */
   badPAO_Interrupt,  /* Pulse acc overf  */
   badTO_Interrupt, /* Timer overf      */
   badOC5_Interrupt, /* Output compare 5 */
   badOC4_Interrupt, /* Output compare 4 */
   badOC3_Interrupt,  /* Output compare 3 */
   badOC2_Interrupt, /* Output compare 2 */
   badOC1_Interrupt, /* Output compare 1 */
   badIC3_Interrupt, /* Input capture 3  */
   badIC2_Interrupt, /* Input capture 2  */
   badIC1_Interrupt, /* Input capture 1  */
   RTI_Interrupt, /* Real time        */
   Uart_Interrupt, /* IRQ              */
   PFI_Interrupt, /* XIRQ             */
   badSWI_Interrupt, /* SWI              */
   IlOpC_Interrupt, /* illegal          */
   _stext,  /* cop fail         */
   _stext,  /* cop clock fail   */
   _stext,  /* RESET            */
};

A couple of points are worth making:

The above is insufficient to locate the table correctly in memory. This has to be done via linker directives.
Note that unused interrupts still have an entry in the table. Doing so ensures that the table is correctly aligned and traps can be placed on unexpected interrupts.

If any of these examples has whet your appetite for using arrays of function pointers, but you are still uncomfortable with the declaration complexity, then fear not! You will find a variety of declarations, ranging from the straightforward to the downright appalling below. The examples are all reasonably practical in the sense that the desired functionality is not outlandish (that is, there are no declarations for arrays of pointers to functions that take pointers to arrays of function pointers and so on).

Declaration and use hints

All of the examples below adhere to conventions that I have found to be useful over the years, specifically:

1. All of the examples are preceded by static. This is done on the assumption that the scope of a function table should be highly localized, ideally within an enclosing function.

2. In every example the array pf[] is also preceded with const. This declares that the pointers in the array cannot be modified after initialization. This is the normal (and safe) usage scenario.

3. There are two syntactically different ways of invoking a function via a pointer. If we have a function pointer with the declaration:

void (*fnptr)(int); /* fnptr is a function pointer */

Then it may be invoked using either of these methods:

fnptr(3); /* Method 1 of invoking the function */
(*fnptr)(3); /* Method 2 of invoking the function */

The advantage of the first method is an uncluttered syntax. However, it makes it look as if fnptr is a function, as opposed to being a function pointer. Someone maintaining the code may end up searching in vain for the function fnptr(). With method 2, it is much clearer that we are dereferencing a pointer. However, when the declarations get complex, the added (*) can be a significant burden. Throughout the examples, each syntax is shown. In practice, the latter syntax seems to be more popular--and you should use only one.

4. In every example, the syntax for using a typedef is also given. It is quite permissible to use a typedef to define a complex declaration, and then use the new type like a simple type. If we stay with the example above, then an alternative declaration is:

typedef void (*PFV_I )(int);

/* Declare a PVFV_I typed variable and init it */
PFV_I fnptr = fna;

/* Call fna with parameter 3 using method 1 */
fnptr(3); 

/* Call fna with parameter 3 using method 2 */  
(*fnptr)(3);

The typedef declares the type PFV_I to be a pointer to a function that returns void and is passed an integer. We then simply declare fnptr to a variable of this type, and use it. Typedefs are very good when you regularly use a certain function pointer type, since it saves you having to remember and type in the declaration. The downside of using a typedef, is the fact that it is not obvious that the variable that has been declared is a pointer to a function. Thus, just as for the two invocation methods above, you can gain syntactical simplicity by hiding the underlying functionality.

In the typedefs, a consistent naming convention is used. Every type starts with PF (Pointer to Function) and is then followed with the return type, followed by an underscore, the first parameter type, underscore, second parameter type and so on. For void, boolean, char, int, long, float and double, the characters V, B, C, I, L, S, D are used. (Note the use of S(ingle) for float, to avoid confusion with F(unction)). For a pointer to a data type, the type is preceded with P. Thus PL is a pointer to a long. If a parameter is const, then a c appears in the appropriate place. Thus, cPL is a const pointer to a long, whereas a PcL is a pointer to a const long, and cPcL is a const pointer to a const long. For volatile qualifiers, v is used. For unsigned types, a u precedes the base type. For user defined data types, you are on your own!

An extreme example: PFcPcI_uI_PvuC. This is a pointer to a function that returns a const pointer to a const Integer that is passed an unsigned integer and a pointer to a volatile unsigned char.

Function pointer templates

The first eleven examples are generic in the sense that they do not use memory space qualifiers and hence may be used on any target. Example 12 shows how to add memory space qualifiers, such that all the components of the declaration end up in the correct memory spaces.

Example 1

pf[] is a static array of pointers to functions that take an INT as an argument and return void.

void fna(INT); // Example prototype of a function to be called

// Declaration using typedef
typedef void (* const PFV_I)(INT);
static PFV_I pf[] = {fna,fnb,fnc, … fnz);

// Direct declaration
static void (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
pf[jump_index](a); // Calling method 1
(*pf[jump_index])(a); // Calling method 2

Example 2

pf [] is a static array of pointers to functions that take a pointer to an INT as an argument and return void.

void fna(INT *); // Example prototype of a function to be called

// Declaration using typedef
typedef void (* const PFV_PI)(INT *);
static PVF_PI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static void (* const pf[])(INT *) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
pf[jump_index](&a); // Calling method 1
(*pf[jump_index])(&a); // Calling method 2

Example 3

pf [] is a static array of pointers to functions that take an INT as an argument and return a CHAR

CHAR fna(INT);  // Example prototype of a function to be called

// Declaration using typedef
typedef CHAR (* const PFC_I)(INT);
static PVC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static CHAR (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
CHAR res;
res = pf[jump_index](a); // Calling method 1
res = (*pf[jump_index])(a); // Calling method 2

Example 4

pf [] is a static array of pointers to functions that take an INT as an argument and return a pointer to a CHAR.

CHAR *fna(INT); // Example prototype of a function to be called

// Declaration using typedef
typedef CHAR * (* const PFPC_I)(INT);
static PVPC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static CHAR * (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
CHAR * res;
res = pf[jump_index](a);  // Calling method 1
res = (*pf[jump_index])(a); // Calling method 2

Example 5

pf [] is a static array of pointers to functions that take an INT as an argument and return a pointer to a const CHAR (i.e. the pointer may be modified, but what it points to may not).

const CHAR *fna(INT);  // Example prototype of a function to be called

// Declaration using typedef
typedef const CHAR * (* const PFPcC_I)(INT);
static PVPcC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
const CHAR * res;
res = pf[jump_index](a);  //Calling method 2
res = (*pf[jump_index])(a); //Calling method 2

Example 6

pf [] is a static array of pointers to functions that take an INT as an argument and return a const pointer to a CHAR (i.e. the pointer may not be modified, but what it points to may be modified).

CHAR * const fna(INT i);  // Example prototype of a function to be called

// Declaration using typedef
typedef CHAR * const (* const PFcPC_I)(INT);
static PVcPC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static CHAR * const (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
CHAR * const res = pf[jump_index](a); //Calling method 1
CHAR * const res = (*pf[jump_index])(a); //Calling method 2

Example 7

pf [] is a static array of pointers to functions that take an INT as an argument and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(INT i);  // Example function prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_I)(INT);
static PVcPcC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
const CHAR* const res = pf[jump_index](a);  // Calling method 1
const CHAR* const res = (*pf[jump_index])(a);  // Calling method 2

Example 8

pf [] is a static array of pointers to functions that take a pointer to a const INT as an argument (i.e. the pointer may be modified, but what it points to may not) and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(const INT *i); // Example prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_PcI)(const INT *);
static PVcPcC_PcI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(const INT *) = {fna, fnb, fnc, … fnz};

// Example use
const INT a = 6;
const INT *aptr;
aptr = &a;
const CHAR* const res = pf[jump_index](aptr); //Calling method 1
const CHAR* const res = (*pf[jump_index])(aptr);//Calling method 2

Example 9

pf [] is a static array of pointers to functions that take a const pointer to an INT as an argument (i.e. the pointer may not be modified, but what it points to may ) and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(INT *const i); // Example prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_cPI)(INT * const);
static PVcPcC_cPI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(INT * const) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
INT *const aptr = &a;
const CHAR* const res = pf[jump_index](aptr);  //Method 1
const CHAR* const res = (*pf[jump_index])(aptr);  //Method 2

Example 10

const CHAR * const fna(const INT *const i); // Example prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_cPcI)(const INT * const);
static PVcPcC_cPcI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(const INT * const) = {fna, fnb, fnc, … fnz};

// Example use
const INT a = 6;
const INT *const aptr = &a;

const CHAR* const res = pf[jump_index](aptr);  // Method 1
const CHAR* const res = (*pf[jump_index])(aptr);  // Method 2

This example manages to combine five incidences of const and one of static into a single declaration. For all of its complexity, however, this is not an artificial example. You could go ahead and remove all the const and static declarations and the code would still work. It would, however, be a lot less safe, and potentially less efficient.

Just to break up the monotony, here is the same declaration, but with a twist.

Example 11

volatile CHAR * const fna(const INT *const i); // Example prototype

// Declaration using typedef
typedef volatile CHAR * const (* const PFcPvC_cPcI)(const INT * const);
static PVcPvC_cPcI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static volatile CHAR * const (* const pf[])(const INT * const) = {fna, fnb, fnc, … fnz};

// Example use
const INT a = 6;
const INT * const aptr = &a;

volatile CHAR * const res = pf[jump_index](aptr); // Method 1
volatile CHAR * const res = (*pf[jump_index])(aptr); //Method 2

while (*res)
 ; //Wait for volatile register to clear

With memory space qualifiers, things can get even more hairy. For most vendors, the memory space qualifier is treated syntactically as a type qualifier (such as const or volatile) and thus follows the same placement rules. For consistency, I place type qualifiers to the left of the "thing" being qualified. Where there are multiple type qualifiers, alphabetic ordering is used. Since memory space qualifiers are typically compiler extensions, they are normally preceded by an underscore, and hence come first alphabetically. Thus, a nasty declaration may look like this:

_ram const volatile UCHAR status_register;

To demonstrate memory space qualifier use, here is example 11 again, except this time memory space qualifiers have been added. The qualifiers are named _m1 … _m5.

Example 12

pf [] is a static array of pointers to functions that take a const pointer to a const INT as an argument (i.e. the pointer nor what it points to may be modified) and return a const pointer to a volatile CHAR (i.e. the pointer may be modified, but what it points to may change unexpectedly). Each element of the declaration lies in a different memory space. In this particular case, it is assumed that you can even declare the memory space in which parameters passed by value appear. This is extreme, but is justified on pedagogical grounds.

/* An example prototype. This declaration reads as follows.
* Function fna is passed a const pointer in _m5 space that points to a
* const integer in _m4 space. It returns a const pointer in _m2 space to
* a volatile character in _m1 space.
*/
_m1 volatile CHAR * _m2 const fna(_m4 const INT * _m5 const i);

/* Declaration using typedef. This declaration reads as follows.
* PFcPvC_cPcI is a pointer to function data type, variables based
* upon which lie in _m3 space. Each Function is passed a const
* pointer in _m5 space that points to a const integer in _m4 space.
* It returns a const pointer in _m2 space to a volatile character
* in _m1 space.
*/
typedef _m1 volatile CHAR * _m2 const (* _m3 const PFcPvC_cPcI) (_m4 const INT * _m5 const);

static PVcPvC_cPcI[] = {fna,fnb,fnc, … fnz};

/* Direct declaration. This declaration reads as follows. pf[] is
* a statically allocated constant array in _m3 space of pointers to functions.
* Each Function is passed a const pointer in _m5 space that points to
* a const integer in _m4 space. It returns a const pointer in _m2 space
* to a volatile character in _m1 space.
*/
static _m1 volatile CHAR * _m2 const (* _m3 const pf[]) (_m4 const INT * _m5 const) = {fna, fnb, fnc, … fnz};

// Declare a const variable that lies in _m4 space
_m4 const INT a = 6;

// Now declare a const pointer in _m5 space that points to a const
// variable that is in _m4 space
_m4 const INT * _m5 const aptr = &a;

// Make the function call, and get back the pointer
volatile CHAR * const  res = pf[jump_index](&a);  //Method 1
volatile CHAR * const  res = (*pf[jump_index])(&a);  //Method 2
 
while (*res)
 ; // Wait for volatile register to clear