Saturday, January 19, 2013

Stanford Parser tutorial example guide ( Stanford Parser 教學 範例 )

A simple tutorial and example for Stanford Parser version 2.0.3 in Linux


1.Stanford parser download:
 http://nlp.stanford.edu/software/lex-parser.shtml


2.Decompress
tar -xzvf stanford-parser-2012-07-09.tgz
cd stanford-parser-2012-07-09


3.
(1)Parsing English file:
./lexparser.sh Inputfile
(Inputfile is the file you want to parse)
you can modify lexparser.sh to change options,
for example,  change outputformat from "penn,typedDependencies" to "oneline"


(2)Parsing Chinese file:

(i) First, modify lexparser-lang.sh:
vi lexparser-lang.sh

     Change memory option:
Change mem=10g to mem=2g  (Since my computer is 32bit and not enough memory)

     Modify line 48 to:
java -server -Xmx"$mem" -Xms"$mem" -cp "$scriptdir/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser -v -maxLength "$len" \

     Modify line 50 to:
-outputFilesExtension "$out_file"."$len".stp -outputFormat "oneline" \
   

(ii) Decompress the Chinese grammar file:
jar -xvf stanford-parser-2012-07-09-models.jar

(iii) Now we can parse :
./lexparser-lang.sh Chinese 100 edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz hahaha Inputfile

the output file will be "Inputfile.hahaha.100.stp"

(Inputfile is the file you want to parse, 100 is maximum length of the sentences to parse, edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz is serialized grammar file,hahaha is prefix for the output filename)

you can modify lexparser-lang.sh to change options,
for example,  change outputformat from "penn,typedDependencies" to "oneline"


Reference: http://linglit194.linglit.tu-darmstadt.de/linguisticsweb/bin/view/LinguisticsWeb/StanfordParser


後記 : 2012-07-09這個版本的lexparser-lang.sh有一些bug...搞了好久才弄清楚怎麼用

2 comments: