Remove all comments from C and C++ source code

At the moment I’m struggling with Microchip’s new “Harmony” framework for the PIC32. I don’t want to say bad things about it because (a) I haven’t used it enough to give a fair opinion and (b) I strongly suspect it’s a useful thing for some people, some of the time.

Harmony is extremely heavyweight. For example, the PDF documentation is 8769 pages long. That is not at all what I want – I want to work very close to the metal, and to personally control nearly every instruction executed on the thing, other than extremely basic things like <stdlib.h> and <math.h>.

Yet Microchip says they will be supporting only Harmony (and not their old “legacy” peripheral libraries) on their upcoming PIC32 parts with goodies like hardware floating point, which I’d like to use.

So I’m attempting to tease out the absolute minimum subset of Harmony needed to access register symbol names, etc., and do the rest myself.

My plan is to use Harmony to build an absolutely minimum configuration, then edit down the resulting source code to something manageable.

But I found that many of Microchip’s source files are > 99% comments, making it essentially impossible to read the code and see what it actually does. Often there will be 1 or 2 lines of code here and there separated by hundreds of lines of comments.

So I wrote the below Python script. Given a folder, it will walk thru every file and replace all the .c, .cpp, .h, and .hpp files with identical ones but with all comments removed.

I’ve only tested it on Windows, but I don’t see any reason why it shouldn’t work on Linux and Mac.

from __future__ import print_function
import sys, re, os

# for Python 2.7
# Use and modification permitted without limit; credit to NerdFever.com requested.

# thanks to zvoase at http://stackoverflow.com/questions/241327/python-snippet-to-remove-c-and-c-comments
# and Lawrence Johnston at http://stackoverflow.com/questions/1140958/whats-a-quick-one-liner-to-remove-empty-lines-from-a-python-string
def comment_remover(text):
    def replacer(match):
        s = match.group(0)
        if s.startswith('/'):
            return " " # note: a space and not an empty string
        else:
            return s
    pattern = re.compile(
        r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"',
        re.DOTALL | re.MULTILINE
    )
    
    r1 = re.sub(pattern, replacer, text)
    
    return os.linesep.join([s for s in r1.splitlines() if s.strip()])


def NoComment(infile, outfile):
        
    root, ext = os.path.splitext(infile)
    
    valid = [".c", ".cpp", ".h", ".hpp"]
    
    if ext.lower() in valid:
           
        inf = open(infile, "r")

        dirty = inf.read()
        clean = comment_remover(dirty)

        inf.close()
        
        outf = open(outfile, "wb") # 'b' avoids 0d 0d 0a line endings in Windows
        outf.write(clean)
        outf.close()
        
        print("Comments removed:", infile, ">>>", outfile)
        
    else:

        print("Did nothing:     ", infile)

if __name__ == "__main__":
    
    if len(sys.argv) < 2:
        print("")

        print("C/C++ comment stripper v1.00 (c) 2015 Nerdfever.com")
    
        print("Syntax: nocomments path")

        sys.exit()
        
    root = sys.argv[1]
    
    for root, folders, fns in os.walk(root):

        for fn in fns:
    
            filePath = os.path.join(root, fn)
            NoComment(filePath, filePath)
    

To use it, put that in "nocomments.py", then do:

python nocomments.py foldername

Of course, make a backup of the original folder first.

3 thoughts on “Remove all comments from C and C++ source code

  1. If you look at just the Driver Libraries in Harmony, are those roughly the equivalent of the old peripheral libraries? The documentation for the Driver Libraries is a mere 1,129 pages.

    MPLAB X can collapse comments by default. Go to Tools | Options, Editor, Folding, and check the Comments box. Files you open after that should have the comments collapsed.

  2. Thanks; Tools>Options>Editor>Folding is very useful; I didn’t know about it.

    Another trick I found useful is to go to the Harmony folder, right click, then Properties, and check Read-only (recursively).

    That sets all the standard Harmony files as read-only; NetBeans is smart enough to know it – it italicizes the file name and greys out the editor window – so you can read that code but not modify the “master” files.

    Here’s an example of the kind of thing I’m uncomfortable about with Harmony:

    MHC creates a file “system_config.h” that includes the FOSC clock rate you selected via the setup GUI:

    #define SYS_CLK_FREQ 4000000ul

    Now, what happens if my code goes and changes the clock rate while running? Will the rest of Harmony know it? I don’t see how. Will it assume it’s still running at 40 MHz (when it’s not) and get all kinds of timing things wrong? How can I be sure that doesn’t happen?

    I’d much rather manage this stuff myself.

    Maybe I’m just not trusting enough. But I’m not.

  3. Hi Dave,

    Originally I was going to post about the comment folding options, but see that Bob beat me to it.

    You are right that embedded MCUs have traditionally been more bare-metal programming exercises, particularly if you’ve had long experience with a platform and the base peripheral set. However, the industry is changing as more connectivity, files systems, and GUI expectations end up being part of modern projects.

    You might want to check out the Renesas Synergy Platform if you feel that Harmony isn’t exactly what you want. Synergy has the HAL, Framework and (optional) RTOS wrapped together with an Eclipse IDE and ARM Cortex cores. The nice part is that the product family was designed to the software API sepcification – not the other way around. That means the complexity in the drivers, stacks, and middleware are kept to a minimum while still supporting low-end and high end performance. And the PDFs are fully hyperlinked, with the API document coming in at a modest 2700 pages. 😉

    Google Renesas Synergy or go to synergyxplorer dot com to learn more.

    –CG

Leave a Reply

Your email address will not be published. Required fields are marked *