ÖпÆÔº·Ö´Ê¹¤¾ßimdict chinese analyzerѧϰ java·Ö´Ê
ÏÂÔØÁ´½Óhttp://ictclas.org/Down_OpenSrc.asp
¼òµ¥½éÉÜ£º
imdict-chinese-analyzerÊÇ imdictÖÇÄܴʵäµÄÖÇÄÜÖÐÎÄ·Ö´ÊÄ£¿é£¬×÷Õ߸ßСƽ£¬Ëã·¨»ùÓÚÒþÂí
¶û¿Æ·òÄ£ÐÍ(Hidden Markov Model, HMM)£¬ÊÇÖйú¿ÆÑ§Ôº¼ÆËã¼¼ÊõÑо¿ËùµÄictclasÖÐÎķִʳÌÐò
µÄÖØÐÂʵÏÖ£¨»ùÓÚJava£©£¬¿ÉÒÔÖ±½ÓΪluceneËÑË÷ÒýÇæÌṩÖÐÎÄ·Ö´ÊÖ§³Ö¡£
Ó¦Óãº
ϵ½µÄѹËõ°ü½âѹºó¾ÍÊÇÒ»¸öjava¹¤³Ì£¬eclipseÖ±½Óµ¼Èë¼´¿É£¬µ«ÓÉÓÚÆä¿ª·¢µÄ»·¾³ÊÇUTF8ËùÒÔ
Òª½«eclipseµÄ¹¤×÷¿Õ¼äµÄ±àÂëÒ²ÉèÖÃΪutf8£¬test°üÀïÃæµÄAnalyzerTest¾ÍÊÇÆäÓ÷¨£¬¿´ÁËÒÔºó
¾Í¿ÉÒÔÖ±½ÓÓÃÁË
¹¦ÄÜ£ºÖÐÎÄ·Ö´Ê¡¢Í£Ö¹´Ê¹ýÂË
Óŵ㣺¿ªÔ´£¬·Ö´ÊËٶȿ죬ЧÂʸß
ȱµã£º²»Ö§³Ö×Ô¼ºÌí¼Ó´Ê¿â£¬²»Ö§³Ö´ÊÐÔ±ê×¢£¨¿ª·¢ÈËÔ±×Ô¼ºËµÊÇΪÁËÌá¸ßËÙ¶È£©£¬dataÎļþ¼Ð½ö
×Ô´øÁËÁ½¸ö×ÖµäcoredictºËÐÄ×ֵ䡢bigramdict´Ê¹ØÏµ×ֵ䣬ÕâÊÇÁ½¸ö×îÖØÒªµÄ´Êµä£¬Ã»ÓеØÃûºÍ
ÈËÃûµÄ´Êµä£¬ËùÒÔҪʶ±ðÈËÃûµØÃû±È½ÏÂé·³£¬¾Ý˵ҪÓòã´Îhmm£¬ÏÈ´Ö·ÖÔÚϸ·Ö¡£
ÉîÈëѧϰ£ºÖ÷ÀàÊÇnet.imdict.analysis.chineseÖеÄChineseAnalyzer.javaËü¼Ì³ÐÁËluceneµÄ
AnalyzerÀ࣬ÓÐÁ½¸ö¹¹Ôì·½·¨£ºpublic ChineseAnalyzer()¡¢public ChineseAnalyzer
(Set<String> stopWords)µÚ¶þ¸ö¹¹Ôì·½·¨Ö§³ÖÍ£Óôʣ¬×îÖØÒªµÄÊÇtokenStreamº¯Êý£¬ËüÓÃÁË
SentenceTokenizerºÍnew WordTokenizer£¬Ç°Ò»¸öÊǽ«ÎÄÕ·ֳɾä×Ó£¬ºóÒ»¸öÊǽ«¾ä×ӷֳɵ¥´Ê£¬
µ¥´ÊºÍ¾ä×Ó¶¼ÊÇÓÃLuceneµÄToken£¨´Ê£©µÄÀà´æ´¢µÄ£¬£¨TokenÊÇÒ»¸ö³éÏóÀ࣬TokenStreamÊÇToken
ÀàµÄ×ÓÀ࣬µ«Ò²ÊÇÒ»¸ö³éÏóÀ࣬TokenizerºÍTokenFilterÔòÊÇTokenStreamµÄ¾ßÌåʵÏÖ£¬ËûÃÇʵÏÖ
ÁËTokenStreamµÄnext()·½·¨£¬TokenizerµÄnext·½·¨·µ»ØµÄÊÇÔʼµÄ¡¢ÇзֳöÀ´µÄ´Ê£¬¶ø
TokenFilter·½·¨·µ»ØµÄÊÇÒ»¸ö¾¹ý¹ýÂ˵ĴÊÌõ£¬ËûÃǽáºÏÆðÀ´ÐγÉLucene·ÖÎöÆ÷µÄºËÐĽṹ£©Èç
Token token = new Token()£¬È»ºóͨ¹ýtoken.reinit(buffer.toString(), tokenStart,
tokenEnd, "sentence");ÖмäÁ½¸ö²ÎÊýÊÇToken´æ´¢µÄ×Ö·û´®µÄÆðֹλÖã¬ÒÔ0¿ªÊ¼¼ÆÊý£¬ÒýÓÃ
tokenÖÐ×Ö·û´®µÄº¯ÊýÊÇtoken.term()£¬ÕæÕýµ÷Ó÷ִʺËÐÄËã·¨µÄWordSegmenterµÄ
segmentSentence·½·¨¶Ô¾ä×Ó½øÐзִʣ¬ÔÚWordTokenizerÀàÖе÷ÓÃËüµÃµ½·Ö´Ê½á¹û¡£ÔÚÍùϲãµÄ´ú
ÂëÎÒ¾Íû¿´ÁË¡£
Á½¸ö¸Ä¶¯£º
£¨1£©ChineseAnalyzerÖ»ÄܶÔÎļþ½øÐзִʣ¬ÈçºÎ¶ÔÒ»¸ö×Ö·û´®½øÐзִʣ¬¸Ä¶¯ÈçÏÂ
/* TokenStream ts = ca.tokenStream("sentence", new InputStreamRe
Ïà¹ØÎĵµ£º
1¡¢FACTORY—×·MMÉÙ²»ÁËÇë³Ô·¹ÁË£¬Âóµ±À͵靈áºÍ¿ÏµÂ»ùµÄ¼¦³á¶¼ÊÇMM°®³ÔµÄ¶«Î÷£¬ËäÈ»¿ÚζÓÐËù²»Í¬£¬µ«²»¹ÜÄã´øMMÈ¥Âóµ±ÀÍ»ò¿ÏµÂ»ù£¬Ö»¹ÜÏò·þÎñԱ˵“À´Ëĸö¼¦³á”¾ÍÐÐÁË¡£Âóµ±ÀͺͿϵ»ù¾ÍÊÇÉú²ú¼¦³áµÄFactory
¡¡¡¡¹¤³§Ä£Ê½£º¿Í»§ÀàºÍ¹¤³§Àà·Ö¿ª¡£Ïû·ÑÕßÈκÎʱºòÐèҪijÖÖ²úÆ·£¬Ö»ÐèÏò¹ ......
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
/**
* @author dengshaohua
*/
public class ReadPhone {
/**
* ¶ÁÈ¡Êý¾Ý
*/
public void ReadData(){
try {
FileReader read = new File ......
×î½üÒªÓÃjspÔÚÍøÒ³ÉÏÏÔʾͼ±íÇúÏߣ¬Íø²éÀ´²éÈ¥¾ÓȻûÓÐÎÒÏëÒªµÄ¶«Î÷£¬°¦£¡¿ÉÄܹØ×¢Õâ·½ÃæÓ¦ÓõÄÈËÌ«ÉÙÁ˰ɡ£²»¹ýÓÐÐÒÈÃÎÒÏÂÔØÁËTeeChart for java£¬ÄǸö½Ð¸ßÐËѽ£¬ÕæÏóÊÇÓöµ½ÁËÊ®¼¸Äêû¼ûµÄÀÏÓÑ¡£µ«ÍøÉÏÏà¹ØÖÐÎÄÎĵµ¡¢×ÊÁÏÌ«ÉÙ£¨¶¼ÊÇaspµÄ£©£¬Ã»°ì·¨£¬Ö»ºÃÓÃ×Ô¼º”¶þ°Ñµ¶“µÄÓ¢Óïˮƽ·Òë¿Ø¼þ°üÖÐ×Ô´øµÄ×ÊÁÏ£¬¸ãÁ ......
±¾ÎÄ´Óhttp://www.blogjava.net/breezedancer/archive/2007/07/19/131264.html×ªÔØ
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.RandomAccessFile;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
//¶ÏµãÐø´ ......