Search and Truncate / Trim Paragraph By Sentence C# (not word or character)

mag-glassI’ve had the need on more than one occasion to preview or trim a paragraph around a search term, but I don’t want to just use a character count and cut words in half, or a word could and cut sentences in half. My simple method below takes a paragraph and a search word and returns a string that is truncated by sentences. With some additional params, you can decide how many sentences to include from around your desired sentence, if you want a “read more” tag or begging tag to indicate more text exists.

In this example, I’m searching for a word in a string object (‘para’).

truncateBySent(para, "redmond", 1, 1, true, " View More...", false, "...", false, "Not Found")

In this example paragraph:

By a board session on the weekend of Dec. 14, some board members were exhausted from the months of work, and were concerned that the process had dragged on, said a person familiar with the matter. They left the meeting, at a hotel less than 8 miles from Microsoft’s Redmond, Wash., headquarters, without a pick. The board had more potential CEO candidates with whom they wanted to meet, and were frustrated that research wasn’t ready on at least one new prospect, said people familiar with the situation.

It would return:

…They left the meeting, at a hotel less than 8 miles from Microsoft’s Redmond, Wash., headquarters, without a pick. The board had more potential CEO candidate s with whom they wanted to meet, and were frustrated that research wasn’t ready on at least one new prospect, said people familiar with the situation.
 

Another variation:

truncateBySent(para, “board”, 1, 1, true, ” View More…”, false, “…”, false, “Not Found”)

Returns:

By a board session on the weekend of Dec. 14, some board members were exhausted from the months of work, and were concerned that the process had dragged on, said a person familiar with the matter. They left the meeting, at a hotel less than 8 miles from Microsoft’s Redmond, Wash., headquarters, without a pick. View More…
 

And here is the function, you’ll need to import:

using System.Linq;
using System.Text.RegularExpressions;
  public static string truncateBySent(string source, string searchWord, int sentPrepend = 1, int sentAppend = 1, bool onlyShowFirst = true, string viewMoreTag = "", bool alwaysShowViewMoreTag = false, string startTruncTag = "", bool returnSourceIfKeywordNotFound = false, string returnNotFound = "")
        {
            //going to be the final string
            string truncated = "";

            //parse source sentences
           string[] sents = Regex.Split(source, @"(?<=[.?!;])\s+(?=\p{Lu})");

            //create some search start & end holders
            int i = 0;
            int ssent = -1;
            int esent = 0;

            //find start / end
            foreach (string sent in sents)
            {
                //search using regex for word boundaries \b
                if (Regex.IsMatch(sent, "\\b" + searchWord + "\\b", RegexOptions.IgnoreCase))
                {
                    if (ssent == -1)
                    {
                        ssent = i;
                    }
                    else
                    {
                        esent = i;
                    }
                }

                i++;
            }

            //make final string:

            if (esent == 0 || onlyShowFirst == true) esent = ssent;

            i = 0;

            foreach (string sent in sents)
            {
                if (i == ssent - sentPrepend || i == ssent || i == esent + sentAppend || (i >= ssent - sentPrepend && i <= esent + sentAppend))
                {
                    truncated = truncated + sent + " ";
                }

                i++;
            }

            //add view more

            if (esent + sentAppend + 1 < sents.Count() || alwaysShowViewMoreTag == true)
            {
                truncated = truncated + viewMoreTag;
            }

            //add beginning tag
            if (ssent - sentPrepend > 0)
            {
                truncated = startTruncTag + truncated;
            }

            //check if anything was even found:

            if (ssent == -1)
            {
                if (returnSourceIfKeywordNotFound)
                { truncated = source; }
                else
                {
                    truncated = returnNotFound;
                }
            }

           //and now return the final string - do a trim and remove double spaces.
            //did i ever mention how much i despise double spaces?
            return truncated.Trim().Replace("  ", " ");
        }

 

Happy Searching!

Search and Truncate / Trim Paragraph By Sentence C# (not word or character)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s