More Fun with Regular Expressions : Word and Paragraph Parsing
Trolling the ASP.NET forums again this morning, I know I do it a lot, I found a question trying to parse the paragraphs out of a series of text. So I knew I had to answer it. The regular expression needed is '(.+)'. This tells the Regular Expression object to match on a series of one or more word related characters. This means it will group matches for a paragraph, indicated by a line or carriage return. Code for this solution would look like this:
public static MatchCollection GetParagraphs()
{
using (StreamReader sr = new StreamReader(@"{Path To Sampel File}\SampleText.txt"))
{
string textFromFile = sr.ReadToEnd();
Regex rg = new Regex(@"(.+)");
return rg.Matches(textFromFile);
}
}
I thought I would extend this to get a word count as well as all the words. In this case the expression is '(\w+)'.
public static MatchCollection GetWords()
{
using (StreamReader sr = new StreamReader(@"{Path To Sampel File}\SampleText.txt"))
{
string textFromFile = sr.ReadToEnd();
Regex rg = new Regex(@"(\w+)");
return rg.Matches(textFromFile);
}
}
Calling the RegEx.Matches method returns a MatchCollection, which has a Count property, can be used to get the count of matches. It can also be enumerated through to get that actual matches.
public static void WriteMatchCollectionResults(MatchCollection mc)
{
Console.WriteLine(mc.Count);
foreach (Match m in mc)
{
Console.WriteLine(m.Value);
}
Console.WriteLine("...........................................");
Console.WriteLine("");
}