Wednesday, September 27, 2006 - Posts

Lexical Scanning and Parsing Part 2

Yesterday in part one I discussed the honors project I did as a senior in college.  I left you with the statement that I had recently had the opportunity to use that "only academic" knowledge in the real world.

The company I work for builds a content management system.  I was recently asked to build a way to import content.  Now that's simple enough right? Right.  What's the difficulty and how does this fit in with Part 2 of Lexical Scanning and Parsing you ask?  The difficulty lies in being able to tell the CMS what the content is about.  Being that our company believes this idea of the World Wide Web we've built a way to interrelate pages based on what's in them.  If you have a page about a certain topic, then we think you might want to know about other articles that relate to the one you're looking at.  It's very similar Amazon's 'Customers who searched for "care bears" also expressed interest in: ' feature.  When I'm looking at buying the smash 1986 hit "Care Bears Movie II: New Generation" I want to know what other people like me also liked.

Yes, the difficulty, so how can you tell what a give blob of content is about?  You could read it and manually tell the system.  But when the pages start number over the 10's (try over 4,000) then it becomes a tedious, not-so-fun task.  So I set out to write a language by which we could figure out what to tell the system about the content blob.

First, we take a set of keywords about the content blob (luckily for me they're provided).  Currently though I'm reading some dissertation by some smart guy in Tokyo with some crazy algorithm to figure the keywords out by itself.  That's for version dos.  Back to version one.  I take these keywords and scan them using regular expressions.  I set up an expression that if evaluates to true, I tell our CMS system.

In plain English, I want to do the following:

If this article contains the words "care bears" then this article is about the Care Bears.

Using my knowledge of lexical scanning and parsing I set up a psuedo language that really evaluates boolean expressions.  It has support for AND, OR, XOR, NOT, and grouping parenthesis.  Precedence of the operators is also considered, just like C#.  Here is the Backus-Naur Form for how things are evaluated

 <b-expression> ::= <b-term> [<orop> <b-term>]*
 <b-term>       ::= <not-factor> [<andop> <not-factor>]*
 <not-factor>   ::= [<notop>] <b-factor>
 <b-factor>     ::= $(<b-expression>$) | <regular-expression>
 <orop>   ::= $| | $^
 <andop>  ::= $&
 <notop>  ::= $!

As you can hopefully see from the BNF, the following operators and grouping statements are defined as follows:

OR - $|

AND - $&

XOR - $^

NOT - $!

Begin Group - $(

End Group - $)

The reason operators are so cryptic is that they will separate regular expressions, so I needed combinations of characters that wouldn't be recognized as a regular expression.  I couldn't do '|' as OR because | is alternation in regular expressions.  '^' can be used to represent the start of the line or character negation.  And finally '(' and ')' are used  for grouping.  I added the '$' to the definition of each operator since $ represents the end of line in regular expression, $[plus any character] should never appear in a single line regular expression.  If I've lost you I'm sorry, trust me on this.

Here is an example of a real life expression one of our guys wrote:

$(Cat00138 $& $!speech$) $| $(Cat00192  $& $!eyes$) $| KidsHealthCat20030 $| $($(ear $| nose $| nasal $| throat$)  $& $!$(hearing $| speech $| piercing$)$)

Basically, it's says:

If this article is in (Category00138 AND NOT speech) OR (Category00192 AND NOT eyes) OR Category20030 OR ((ear OR nose OR nasal OR throat) AND NOT (hearing or speech OR piercing))

Though it's kind of complex, this system allows us flexibility like never before.  We can set up any number of regular expressions and join them together with other regular expressions using boolean operators and grouping.  If the whole expression evaluates to true for a given expression, we know quite a bit about an article.

The other positive is this system is fast.  I can scan all 4,000 documents, running about 100+ expressions on each in a few seconds.

If you've read this far you deserve a cookie.  If you've actually understood, you deserve two.  If you didn't follow, please let me know through the comments and I'll explain better.  I want people to understand, this is some cool stuff, and it points out that "worthless academic" knowledge isn't always worthless and academic.

 

with 3 Comments

Being Male Yet Loving, a Letter Home From a Civil War Soldier

Sorry, not a tech post, that's later today.  I just came from bible study.  Now I don't really care what you believe when you read this.  You can be christian, atheist, agnostic, muslim, hindu, or other.  The following is touching to say the least.  To give you some quick background.  The study I'm involved in is talking about how to be a strong man with quality character.  Common "movie men" that come to mind are Maximus (from "Gladiator"), William Wallace (fom "Braveheart"), Hawkeye aka. Nathaniel Poe (from "Last of the Mohicans").  No one would question either of those men's strength and courage, but their hearts were tender when needed.  Here's another, real-life example of Sullivan Ballou, a civil war soldier who wrote a letter home to his wife Sarah before the Battle of Bull Run.  The following stirs something in me, and I hope it does you too, whether married or not.  Here is his letter:

 My very dear Sarah:

     The indications are very strong that we will move in a few days -
perhaps tomorrow. Lest I should not be able to write you again, I feel
impelled to write a few lines that may fall under your eye when I shall
be no more.

     Our movement may be one of a few days duration and full of pleasure
- or it may be one of sever conflict and death to me. Not my will, but
thine, O God, be done. If it is necessary that I should fall on the
battlefield for my country, I am ready. I have no misgivings about, or
lack of confidence in, the cause in which I am engaged, and my courage
does not halt or falter. I know how strongly American Civilization now
leans upon the triumph of the government, and how great a debt we owe
to those who went before us through the blood and suffering of the
Revolution. And I am willing - perfectly willing - to lay down all my
joys in this life to help maintain this government, and to pay that debt.

     But, my dear wife, when I know that with my own joys I lay down
nearly all of yours, and replace them in this life with cares and
sorrows - when, after having eaten for long years the bitter fruit of
orphanage myself, I must offer it as their only sustenance to my dear
little children - is it weak or dishonorable, while the banner of my
purpose floats calmly and proudly in the breeze, that my unbounded
love for you, my darling wife and children, should struggle in fierce,
though useless, contest with my love of country?

     I cannot describe to you my feelings on this calm summer night, when
two thousand men are sleeping around me, many of them enjoying the last,
perhaps, before that of death - and I, suspicious that Death is creeping
behind me with his fatal dart, am communing with God, my country, and
thee.

     I have sought most closely and diligently, and often in my ***,
for a wrong motive in thus hazarding the happiness of those I loved, and
I could not find one. A pure love of my country and the principles I have
often advocated before the people and "the name of honor that I love more
than I fear death" have called upon me, and I have obeyed.

     Sarah, my love for you is deathless, it seems to bind me to you with
mighty cables that nothing but Omnipotence could break; and yet my love of
Country comes over me like a strong wind and bears me irresistibly on,
with all these chains, to the battlefield.

     The memories of the blissful moments I have spent with you come
creeping over me, and I feel most gratified to God and to you that I have
enjoyed them so long. And hard for me it is to give them up and burn to
ashes the hopes of future years when, God willing, we might still have
lived and loved together, and seen our sons grow up to honorable
manhood around us. I have, I know, but few and small claims upon Divine
Providence, but something whispers to me -  perhaps it is the wafted
prayer of my little Edgar - that I shall return to my loved ones
unharmed. If I do not, my dear Sarah, never forget how much I love you,
and when my last breath escapes me on the battlefield, it will whisper
your name.

     Forgive my many faults, and the many pains I have caused you. How
thoughtless and foolish I have often times been! How gladly would I
wash out with my tears every little spot upon your happiness, and
struggle with all the misfortune of this world, to shield you and my
children from harm. But I cannot. I must watch you from the spirit
land and hover near you, while you buffet the storms with your precious
little freight, and wait with sad patience till we meet to part no more.

     But, O Sarah! If the dead can come back to this earth and flit
unseen around those they loved, I shall always be near you; in the
garish day and in the darkest night - amidst your happiest scenes and
gloomiest hours - always, always; and if there be a soft breeze upon your
cheek, it shall be my breath; or the cool air fans your throbbing temple,
it shall be my spirit passing by.

     Sarah, do not mourn me dead; think I am gone and wait for thee, for
we shall meet again.

     As for my little boys, they will grow as I have done, and never know
a father's love and care. Little Willie is too young to remember me long,
and my blue-eyed Edgar will keep my frolics with him among the
dimmest memories of his childhood. Sarah, I have unlimited confidence
in your maternal care and your development of their characters. Tell our
mothers I call God's blessing upon them.

     O Sarah, I wait for you there! Come to me, and lead thither my
children.
                                                   - Sullivan

(From: http://www.naciente.com/essay19.htm)
with 3 Comments