2 questions: Assn.#3

greenspun.com : LUSENET : Brandeis CS114 : One Thread

A couple more questions about Assignment 3:
1) The example output is of the form [['NX', ['The/DT', 'green/JJ', 'dog/NN']], ['VX', ['was/VBD', 'eating/VBG']],.........etc.
Isn't it much easier (to program as well as manipulate) if the data was in the form [['NX', [['The', 'DT'], ['green', 'JJ'], ['dog', 'NN']]], ['VX', [['was', 'VBD'], ['eating', 'VBG']]],........etc.
Maybe i'm wrong, but it seems that taking the data out of its list format significantly reduces its usefulness. (even though it is easier to read).
2) I'm a little confused about Part 3. Is the named entity parsing meant to be a separate function from the regular parser, like the ambiguity_ratio was an additional function for the tagger? Or are we supposed to automatically get a listing of the named entitites following (or interspersed in) our parsed text, when we call parser.parse?
Thanks.

-- Anonymous, March 29, 1999

Answers

In answer to your second question, I'm assuming the name parser is a separate module altogether. Judging by the specs, it looks rather like it's not meant to be used on the same text (it's specifically for Yahoo news articles).
HTH, Vivek

-- Anonymous, March 29, 1999

1.) I asked the same question about the weird formatting (I think), but I asked about the .. it's just different syntax which equate to the same thing, correct?
2.) My understanding is that the 3rd part looks at the NNP's, compares them to a lexicon, and destructively replaces the 'NNP' tag with the appropriate type. (TIME, PERSON, PLACE, etc.) There are a few mis-tags in the tagged text, like the "gen." tag I mentioned previously, but overall it's not too bad.

-- Anonymous, March 29, 1999

Sorry, I forgot something:
The lexicon used in #3 should be derived to specifically suit the text we are given, not _all_ newswire text! I don't see any other way to derive whether an NNP is a person, time, or place via a python function :)
Andrew

-- Anonymous, March 29, 1999

Yeah, Vivek is right. So don't follow my advice, create your own lexicon.
I haven't gotten a clear answer yet, tho: Is #3 supposed to work on this article specifically or on all newswire text in general?
Andrew

-- Anonymous, March 29, 1999

Moderation questions? read the FAQ