Removing HTML tags from a string



The HTML from the API is in the exercise description. It looks like shit and is a potential security threat!

There a couple of ways you can tackle this particular beast:

  • Use REGEX:
    using System.Text.RegularExpressions;
    const string HTML_TAG_PATTERN = "<.*>";
    static string StripHTML (string inputString)
     return Regex.Replace
     (inputString, HTML_TAG_PATTERN, string.Empty);

    Created by a StackOverflow use capdragon

  • HTML Agility Pack
     public static string StripTagsCharArray(string source)
     HtmlDocument htmlDoc = new HtmlDocument();
     htmlDoc.DocumentNode.SelectNodes("//comment()")?.ForEach(c=> c.Remove());
     return htmlDoc.DocumentNode.InnerText;

    Created by users Ssilas777 and Thierry_S

  • And lastly the simplest but fastest one:
     public static string StripTagsCharArray(string source)
     char[] array = new char[source.Length];
     int arrayIndex = 0;
     bool inside = false;
     for (int i = 0; i &amp;lt; source.Length; i++)
     char let = source[i];
     if (let == '<')
     inside = true;
     if (let == '>')
     inside = false;
     if (!inside)
     array[arrayIndex] = let;
     return new string(array, 0, arrayIndex);

    Shared by AuthorProxy in the same thread a capdragons solution.

I went for what is behind door no. 3. IT suites my needs does not require any additional libraries to be loaded does not process the HTML code, so any malicious code it will get its ass ignored by being treated as a string.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s