How to Truncate / Trim Text By Sentence in JavaScript (not word or character)

August 22, 2019 chrisbittingjs, node, string, text, truncateLeave a comment

Many times for showing a preview of longer text, I prefer not just to show a certain amount of characters (although it looks better visually) but to break it via sentence. Below is a small JavaScript function (I use it in NodeJs a bunch) that I’ve used to do this. No, it’s not the most complicated thing ever, but I wanted to share in case it helps anyone.

String.prototype.truncateBySent = function(sentCount = 3, moreText = "") {
  //match ".","!","?" - english ending sentence punctuation
  var sentences = this.match(/[^\.!\?]+[\.!\?]+/g);
  if (sentences) {
    console.log(sentences.length);
    if (sentences.length >= sentCount && sentences.length > sentCount) {
      //has enough sentences
      return sentences.slice(0, sentCount).join(" ") + moreText;
    }
  }
  //return full text if nothing else
  return this;
};

Easy to use, just do something like this:

someText.truncateBySent(2);

Sometimes you also want to include something at the end to indicate there is more text, like a link or ellipsis. You can do this:

someText.truncateBySent(2,' <a href="#">View More</a>');

It will take a line of text like this:

Sample sentence one? Another sentence two. Another three. Is there four? What about five? And six! Finally seven.

And return this:

Sample sentence one? Another sentence two. View More

See a working sample here: https://codepen.io/chrisbitting/pen/oNvzKNz

Or grab it on GitHub: https://github.com/cbitting/jsTruncateBySentence

Fixing / Removing Invalid Characters from a File Path / Name – c#

April 14, 2014 chrisbitting.net, c#, char, path, string, System.IOLeave a comment

Below is a simple method for fixing bad filenames and paths. This uses the character lists from Path.GetInvalidPathChars and Path.GetInvalidFileNameChars (part of System.IO).

You should be able to pass a filename, directory or path. Example, calling these three lines would yield the below:

cleanPath(@"c:\tem|<p\fi<>le.txt")
cleanPath(@"c:\tem|<p\")
cleanPath(@"fi<le.txt")

Returns:

c:\tem-p\fi-le.txt
c:\tem-p\
fi-le.txt

You can also pass a string that’s used to replace the bad characters.

cleanPath(@"c:\tem|<p\fi<>le.txt", string.Empty)

Returns:

c:\temp\file.txt

 private string cleanPath(string toCleanPath, string replaceWith = "-")  
      {  
           //get just the filename - can't use Path.GetFileName since the path might be bad!  
           string[] pathParts = toCleanPath.Split(new char[] { '\\' });  
           string newFileName = pathParts[pathParts.Length - 1];  
           //get just the path  
           string newPath = toCleanPath.Substring(0, toCleanPath.Length - newFileName.Length);   
           //clean bad path chars  
           foreach (char badChar in Path.GetInvalidPathChars())  
           {  
                newPath = newPath.Replace(badChar.ToString(), replaceWith);  
           }  
           //clean bad filename chars  
           foreach (char badChar in Path.GetInvalidFileNameChars())  
           {  
                newFileName = newFileName.Replace(badChar.ToString(), replaceWith);  
           }  
           //remove duplicate "replaceWith" characters. ie: change "test-----file.txt" to "test-file.txt"  
           if (string.IsNullOrWhiteSpace(replaceWith) == false)  
           {  
                newPath = newPath.Replace(replaceWith.ToString() + replaceWith.ToString(), replaceWith.ToString());  
                newFileName = newFileName.Replace(replaceWith.ToString() + replaceWith.ToString(), replaceWith.ToString());  
           }  
           //return new, clean path:  
           return newPath + newFileName;  
      }

Hope it helps!

Search and Truncate / Trim Paragraph By Sentence C# (not word or character)

February 6, 2014 chrisbittingc#, regex, string, trim, truncateLeave a comment

I’ve had the need on more than one occasion to preview or trim a paragraph around a search term, but I don’t want to just use a character count and cut words in half, or a word could and cut sentences in half. My simple method below takes a paragraph and a search word and returns a string that is truncated by sentences. With some additional params, you can decide how many sentences to include from around your desired sentence, if you want a “read more” tag or begging tag to indicate more text exists.

In this example, I’m searching for a word in a string object (‘para’).

truncateBySent(para, "redmond", 1, 1, true, " View More...", false, "...", false, "Not Found")

In this example paragraph:

By a board session on the weekend of Dec. 14, some board members were exhausted from the months of work, and were concerned that the process had dragged on, said a person familiar with the matter. They left the meeting, at a hotel less than 8 miles from Microsoft’s Redmond, Wash., headquarters, without a pick. The board had more potential CEO candidates with whom they wanted to meet, and were frustrated that research wasn’t ready on at least one new prospect, said people familiar with the situation.

It would return:

…They left the meeting, at a hotel less than 8 miles from Microsoft’s Redmond, Wash., headquarters, without a pick. The board had more potential CEO candidate s with whom they wanted to meet, and were frustrated that research wasn’t ready on at least one new prospect, said people familiar with the situation.

Another variation:

truncateBySent(para, “board”, 1, 1, true, ” View More…”, false, “…”, false, “Not Found”)

Returns:

And here is the function, you’ll need to import:

using System.Linq;
using System.Text.RegularExpressions;

  public static string truncateBySent(string source, string searchWord, int sentPrepend = 1, int sentAppend = 1, bool onlyShowFirst = true, string viewMoreTag = "", bool alwaysShowViewMoreTag = false, string startTruncTag = "", bool returnSourceIfKeywordNotFound = false, string returnNotFound = "")
        {
            //going to be the final string
            string truncated = "";

            //parse source sentences
           string[] sents = Regex.Split(source, @"(?<=[.?!;])\s+(?=\p{Lu})");

            //create some search start & end holders
            int i = 0;
            int ssent = -1;
            int esent = 0;

            //find start / end
            foreach (string sent in sents)
            {
                //search using regex for word boundaries \b
                if (Regex.IsMatch(sent, "\\b" + searchWord + "\\b", RegexOptions.IgnoreCase))
                {
                    if (ssent == -1)
                    {
                        ssent = i;
                    }
                    else
                    {
                        esent = i;
                    }
                }

                i++;
            }

            //make final string:

            if (esent == 0 || onlyShowFirst == true) esent = ssent;

            i = 0;

            foreach (string sent in sents)
            {
                if (i == ssent - sentPrepend || i == ssent || i == esent + sentAppend || (i >= ssent - sentPrepend && i <= esent + sentAppend))
                {
                    truncated = truncated + sent + " ";
                }

                i++;
            }

            //add view more

            if (esent + sentAppend + 1 < sents.Count() || alwaysShowViewMoreTag == true)
            {
                truncated = truncated + viewMoreTag;
            }

            //add beginning tag
            if (ssent - sentPrepend > 0)
            {
                truncated = startTruncTag + truncated;
            }

            //check if anything was even found:

            if (ssent == -1)
            {
                if (returnSourceIfKeywordNotFound)
                { truncated = source; }
                else
                {
                    truncated = returnNotFound;
                }
            }

           //and now return the final string - do a trim and remove double spaces.
            //did i ever mention how much i despise double spaces?
            return truncated.Trim().Replace("  ", " ");
        }

Happy Searching!

c# Whole Word Matching (RegEx)

July 22, 2013 chrisbitting.net, asp.net, c#, match, regex, string4 Comments

If you’ve ever wanted to test a string to see if a word exists, but can’t use “.contains” because it doesn’t respect whole words (not that I would expect it to), below is a fast, simple way using Regex:

Of course you’ll need: using System.Text.RegularExpressions;

Now setup your pattern and Regex object:

string pattern = @"\bteam\b";
Regex rx = new Regex(pattern, RegexOptions.IgnoreCase);

Now create a match:

Match m = rx.Match("Teamwork is working together.");

Does the word exist:

if (m.Success) {
//no
}

Try again using a string with the whole word:

Match m = rx.Match("I am just part of the team.");

Does the word exist now?:

if (m.Success) {
//yes!
}

Of course, this is just a tiny portion of the power of Regex. Happy matching!

Chris Bitting

Random, I know.

string

How to Truncate / Trim Text By Sentence in JavaScript (not word or character)

Fixing / Removing Invalid Characters from a File Path / Name – c#

Search and Truncate / Trim Paragraph By Sentence C# (not word or character)

c# Whole Word Matching (RegEx)