Protecting your embedded software against memory corruption

Article By : Lendaro Francucci

This article provides a software method that explains how to deal with corruption of memory data sets stored in non-volatile devices.

The aim of this article is to provide a software method that explains how to deal with corruption of memory data sets stored in non-volatile devices, such as small EEPROM or flash memories. It is common to see these data sets in tiny embedded systems that store persistent data such as configuration parameters, critical system logs, among others. These data sets may be corrupted after a system crash, power failure or an ESD.

This article proposes a simple but effective mechanism that can save such data with a lower likelihood of becoming corrupted. Additionally, this method includes a well-known mechanism to detect variable corruption, because they may be corrupted by a variety of causes such as environmental factors (e.g., EMI, heat, radiation), hardware faults (e.g., power fluctuation, power failures, memory cell faults, address line shorts), or software faults (other software erroneously modifying memory). Even though this article uses C language to implement the proposed method, it can be easily implemented in other programming languages like C++.

Let’s suppose an embedded system that can be configured in runtime through a set of parameters which are stored in a non-volatile memory. These parameters are arranged in a C structure:

typedefstructConfigDataConfigData;structConfigData{intoptionA;longoptionB; };

What would happen if a power failure or system reset occurred while these data are being updated?. It is possible that they are corrupted. To address this problem, a fixed-length binary code called a CRC value is calculated on the configuration data to detect whether they have been corrupted or not. This code is stored, in addition to the data values, in the non-volatile memory:

typedefstructConfigConfig;structConfig{ ConfigData data; Crc32 crc; };

Suppose the software module that implements this method is calledConfig, which provides functions to initialize, set and get the configuration data, while protecting them via calls to the CRC calculator. These functions are defined in a header file namedConfig.hand implemented in a source file namedConfig.c. The following code snippet shows a fragment of the fileConfig.h.

typedefenumConfigErrorCode ConfigErrorCode;enumConfigErrorCode { NO_ERRORS, INIT_DATA, CORRUPT_DATA }; ...typedefvoid(*ConfigErrorHandler)(ConfigErrorCode errorCode); ... ConfigErrorCodeConfig_init(void);voidConfig_setErrorHandler(ConfigErrorHandler errorHandler);boolConfig_getOptionA(int*value);boolConfig_getOptionB(long*value);boolConfig_setOptionA(intvalue);boolConfig_setOptionB(longvalue);

When the system starts, the data stored in the non-volatile memory are first checked and then copied to a variable if they are not corrupted. Otherwise they are restored to the default values, which are defined in the fileConfigDft.h. Even though this procedure is useful when the system starts for the first time, there is a more sophisticated alternative that will be explored later on. The functionConfig_init()covers this feature.

staticconstConfig configDefault = { { CONFIG_OPTA_DFT, CONFIG_OPTB_DFT }, 0 }; ... ConfigErrorCodeConfig_init(void) { ConfigErrorCode res = NO_ERRORS; Crc32_init();if(checkDataFromNVMem(&config) == false) { res = INIT_DATA;if(errorHandler != (ConfigErrorHandler)0) { errorHandler(res); } config = configDefault; NVMem_storeData(CONFIG_ADDR_BEGIN,sizeof(Config), (constuint8_t*)&config); }returnres; }

CRC值设置当配置更新and checked when configuration is read. Updating implies storing the new configuration together with its CRC in both the private variable and the non-volatile memory.

As an example, the functionConfig_setOptionA()shows how to set a configuration option, in this caseoptionA.

boolConfig_setOptionA(intvalue) {boolres = false;if(checkData((constConfig *)&config) == false) {if(errorHandler != (ConfigErrorHandler)0) { errorHandler(CORRUPT_DATA); } }else{ config.data.optionA = value; config.crc = Crc32_calc((constuint8_t*)&config,sizeof(Config), 0xffffffff); NVMem_storeData(CONFIG_ADDR_BEGIN,sizeof(Config), (constuint8_t*)&config); res = true; }returnres; }

After updating the non-volatile memory, the functionConfig_setOptionA()could add another verification to ensure that the recently stored data have been correctly written.

The proposed method in this article suggests reading the configuration from a variable instead of the non-volatile memory directly, since this variable is an updated copy of the configuration stored in the non-volatile memory. When configuration is read, the CRC is recalculated and compared to the stored CRC. If they differ, then theerrorHandler() iscalled. Otherwise the retrieved data is returned to the client. The functionConfig_getOptionA()shown below demonstrates how to retrieve a configuration option from the system configuration.

boolConfig_getOptionA(int*value) {boolres = false;if(checkData((constConfig *)&config) == false) {if(errorHandler != (ConfigErrorHandler)0) { errorHandler(CORRUPT_DATA); } }else{if(value != (int*)0) { *value = config.data.optionA; res = true; } }returnres; }

TheConfig_init()function showed how to deal with data corruption at startup, it suggested restoring the whole configuration to default values. However, a more advanced alternative could be used instead using two blocks of the non-volatile memory to store the configuration data arranged as theConfigstructure suggested. One block is called main and another backup, which will be justified later on. When the system starts the configuration stored in both non-volatile memory blocks are checked, recalculating the CRC and comparing it with the stored CRC; if only one block is corrupted, the whole healthy block will be copied to the other one. If both are corrupted, then they will be restored to default values. The last condition arises when both blocks are healthy. In this situation, the stored CRC of each block is compared with each other. If they differ, the main block will be copied to the backup block.

When using non-volatile devices like flash memories, every block should be assigned an exclusive physical sector of that memory. The next diagram represents the behavior of this mechanism:

TheConfig_init()function is modified to perform the mechanism explained above:

...staticconstRecProc recovery[] = { proc_in_error, proc_recovery, proc_backup, proc_cmp }; ...staticConfigErrorCodeproc_in_error(void) { block = configDefault; block.crc = Crc32_calc((constuint8_t*)&block.data,sizeof(ConfigData), 0xffffffff); NVMem_storeData(CONFIG_MAIN_ADDR,sizeof(Config), (constuint8_t*)&block); NVMem_storeData(CONFIG_BACKUP_ADDR,sizeof(Config), (constuint8_t*)&block);returnCORRUPT_DATA; }staticConfigErrorCodeproc_recovery(void) { block = backupBlock; NVMem_storeData(CONFIG_MAIN_ADDR,sizeof(Config), (constuint8_t*)&block);returnRECOVER_DATA; }staticConfigErrorCodeproc_backup(void) { NVMem_storeData(CONFIG_BACKUP_ADDR,sizeof(Config), (constuint8_t*)&block);returnBACKUP_DATA; }staticConfigErrorCodeproc_cmp(void) { ConfigErrorCode res = NO_ERRORS;if(main.readCRC != backup.readCRC) { res = proc_backup(); }returnres; } ... ConfigErrorCodeConfig_init(void) {intstatus; Crc32_init(); NVMem_readData(CONFIG_MAIN_ADDR,sizeof(Config), (uint8_t*)&block); main.readCRC = Crc32_calc((constuint8_t*)&block.data,sizeof(ConfigData), 0xffffffff); main.result = (main.readCRC == block.crc) ? 1: 0; NVMem_readData(CONFIG_BACKUP_ADDR,sizeof(Config), (uint8_t*)&backupBlock); backup.readCRC = Crc32_calc((constuint8_t*)&backupBlock.data,sizeof(ConfigData), 0xffffffff); backup.result = (backup.readCRC == backupBlock.crc) ? 1: 0; status = 0; status = (main.result << 1) | backup.result;return(*recovery[status])(); }

If the system does not need to check the data set stored in RAM every time a configuration option is accessed by set and get functions, then an alternative version would look like this:

boolConfig_getOptionA(int*value) {boolres = false;if(value != (int*)0) { *value = block.data.optionA; res = true; }returnres; }boolConfig_setOptionA(intvalue) { block.data.optionA = value; block.crc = Crc32_calc((constuint8_t*)&block.data,sizeof(ConfigData), 0xffffffff); NVMem_storeData(CONFIG_MAIN_ADDR,sizeof(Config), (constuint8_t*)&block.data); NVMem_storeData(CONFIG_BACKUP_ADDR,sizeof(Config), (constuint8_t*)&block.data);returntrue; }

The shown source code, written in C language, and its unit test cases are available in thesafety-mem-patternsrepository. It contains three directoriesConfig.alt1/,Config.alt2/andConfig.recovery/which correspond to different alternatives to implement the module Config according to the proposed method.Config.alt1andConfig.alt2are similar, but the last one does not check the data set stored in RAM every time a configuration option is accessed by set and get functions. Whereas the alternativeConfig.recovery isderived fromConfig.alt2 butincludes the recovery mechanism.

Even though the introduced method is an effective manner to protect an embedded software against non-volatile memory corruption, it is strongly recommended to add additional electronic circuitry such as early power failure detection and an alternative power supply like a battery or a super capacitor to deal with a power failure in a more reliable way. While a power failure is in progress, this circuitry not only allows the software system to successfully finish updating the data set in the non-volatile memory, but it also avoids starting a new updating. In turn, a memory corruption product of a power failure will be less likely to occur.

This article was originally published onEmbedded.

Lendaro Francucciis an electronic engineer who has focused in real-time embedded system development using software models in several industries for more than ten years, such as railway, medical, IoT, telecom, and energy. Leandro is the author of the free and open-source RKH state machine framework, and he is also the co-founder and owner ofVortexMakes, a startup to provide consulting and training services in embedded software for companies of all sizes. Leandro is always interested in new challenges, as well as knowledge transfer, researching and constant learning.

Leave a comment