ÉèΪÊ×Ò³ ¼ÓÈëÊÕ²Ø

TOP

È«ÎÄË÷Òý--×Ô¶¨Òåchinese_lexer´Êµä(Ò»)
2014-11-24 00:34:01 À´Ô´: ×÷Õß: ¡¾´ó ÖРС¡¿ ä¯ÀÀ:22´Î
Tags£ºÈ«ÎÄ Ë÷Òý ¶¨Òå chinese_lexer ´Êµä

±¾ÎÄÀ´Ïê½âÒ»ÏÂÈçºÎ×Ô¶¨Òåchinese_lexer´Ë·¨·ÖÎöÆ÷µÄ´Êµä

³õʼ»¯Êý¾Ý

create table test2 (str1 varchar2(2000),str2varchar2(2000)) ;
 
insert into test2
  values('µØÖÊͼ','ÖйúºÍ·´À¡í¸É½Áú¾í·çÁ÷¿ÚË®µØÖÊͼ') ;
insert into test2
  values('ͼƬ','ͼ') ;
commit ;

´´½¨´Ë·¨·ÖÎöÆ÷²¢ÇÒ´´½¨È«ÎÄË÷Òý£¨×¢Òâ´ÊµäÖ»¶Ôchinese_lexerÆð×÷ÓÃ)

exec ctx_ddl.create_preference('my_lexer1','CHINESE_LEXER');
 
EXEC ctx_ddl.create_preference('dataquery','MULTI_COLUMN_DATASTORE');
EXEC ctx_ddl.set_attribute('dataquery','columns', 'str1,str2');
 
CREATE INDEX test2_idx ON test2(str1) INDEXTYPEIS ctxsys.CONTEXT PARAMETERS('datastore dataquery LEXER my_lexer1');

¿´Ò»ÏÂÉú³ÉµÄ´Ê±í£¬¿ÉÒÔ¿´µ½£¬ÊÇûÓеØÖÊͼÕâ¸ö¹Ø¼ü×ֵġ£

ctx@STARTREK>select * from DR$TEST2_IDX$I ;
 
TOKEN_TEXT                                                      TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT
-------------------------------------------------------------------------- ----------- ---------- -----------
TOKEN_INFO
-----------------------------------------------------------------------------------------------------------------------------------------------------
STR1                                                                      0           1          2        2
0090010301900102
 
STR2                                                                     0           1          2        2
0090050B01900402
 
µØÖÊ                                                                     0           1          1       1
0090020C
 
·´À¡                                                                     0           1          1       1
008808
 
ºÍ                                                                        0           1          1       1
008807
 
¿ÚË®                                                                     0           1          1       1
00880D
 
Á÷                                                                       0           1          1       1
00880C
 
Áú¾í·ç                                                                   0           1          1       1
00880B
 
ɽ                                                                       0           1          1      1
00880A
 
ͼ                                                                       0           1          2       2
0090030C018805
 
ͼƬ                                                                     0           2          2       1
008802
 
Öйú                                                                     0           1          1       1
008806
 
í¸É½                                                                     0           1          1       1
008809
 
 
ÒÑÑ¡Ôñ13ÐС£

ÏÂÃæ¿ªÊ¼£¬Ê¹ÓÃ×Ô¶¨Òå´Êµä

C:\Users\fengjun>ctxlc -zht -ocs zhs16GBK> zhs16gbk_102.txt
 
C:\Users\fengjun>zhs16gbk_102.txt

ËÑË÷ÁËÒ»ÏÂÎĵµ£¬ÀïÃæÃ»ÓÐÕÒµ½µØÖÊͼÕâ¸ö¹Ø¼ü´Ê

\

×îĩβ¼ÓÉϵØÖÊͼ

Éú³É×Ô¶¨Òå´ÊµäÐèҪʹÓõÄÈý¸öÒÔd¡¢k¡¢i½áβµÄÎļþ

ÕâÀï×ÜÊdzö´í

C:\Users\fengjun>ctxlc -zht -ics zhs16gbk -izhs16gbk_102.txt
DRG-52107: ctxkbtc internal error
 
¼ÓÉÏ-n²ÎÊý£¬Ë³ÀûÉú³É
 
C:\Users\fengjun>ctxlc -zht -ics zhs16gbk -n-i zhs16gbk_102.txt
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
DRG-52118: Writing index file for terms
DRG-52117: Writing index file for IDs
DRG-52116: Done writing all terms
DRG-52115: Writing new terms in lexicon tofiles
DRG-52114: Writing lexicon to files
 
C:\Users\fengjun>dir dr*
 Çý¶¯Æ÷ C ÖеľíÊÇWindows8_OS
 ¾íµÄÐòÁкÅÊÇ6C5D-2B1F
 
 C:\Users\fengjun µÄĿ¼
 
2014/09/24 14:02         2,250,471 drold.dat
2014/09/24 14:02           391,326 droli.dat
2014/09/24 14:02            89,282 drolk.dat
2014/09/24 13:55           298,206 drolt.dat
              4 ¸öÎļþ      3,029,285 ×Ö½Ú
              0 ¸öĿ¼ 113,255,260,160 ¿ÉÓÃ×Ö½Ú

±¸·Ý$ORACLE_HOME\ctx\data\zhlx

ϵÄÄÚÈÝ£¬²¢ÇÒ½«ÉÏÃæµÄ¼¸¸öÎļþcopyµ½$ORACLE_HOME\ctx\data\zhlxÏ£¬²¢ÇÒ¸ÄÃû

d¡¢k¡¢i½áβµÄ¿½±´¹ýÈ¥¼´¿É

\

Ò»¶¨¼ÇµÃ½«Ô­À´µÄÎļþ±¸·Ýһϡ£

ctx@STARTREK>drop index test2_idx force ;
 
Ë÷ÒýÒÑɾ³ý¡£
 
ctx@STARTREK>CREATE INDEX test2_idx ONtest2(str1) INDEXTYPE IS ctxsys.CONTEXT PARAMETERS('datastore dataquery LEXERmy_lexer1');
 
Ë÷ÒýÒÑ´´½¨¡£
 
ctx@STARTREK>select * from DR$TEST2_IDX$I ;
 
TOKEN_TEXT                                                      TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT
-------------------------------------------------------------------------- ----------- ---------- -----------
TOKEN_INFO
--------------------------------------------------------------------------------------------------------------------------
STR1                                                                     0           1          2        2
0090010201900102
 
STR2                                                                     0           1          2        2
0090040A01900402
 
µØÖÊͼ                                                                   0           1          1       1
0090020B
 
·´À¡                                                                      0           1          1       1
008807
 
ºÍ                                                                       0           1          1       1
008806
 
¿ÚË®                                                                     0           1          1       1
00880C
 
Á÷                                                                       0           1          1       1
00880B
 
Áú¾í·ç                                                                   0           1          1       1
00880A
 
ɽ                                                                       0           1          1       1
008809
 
ͼ                                                                       0           2          2       1
008805
 
ͼƬ                                                                     0           2          2       1
008802
 
Öйú                                                                     0           1          1       1
008805
 
í¸É½                                                                      0           1          1       1
008808
 
 
ÒÑÑ¡Ôñ13ÐС£

¿ÉÒÔ¿´µ½ÒѾ­ÓеØÖÊͼÕâ¸ö¹Ø¼ü×ÖÁË¡£

ÕâÑù×Ô¶¨Òå´Êµä¾ÍÍê³ÉÁË£¬¶ÔÓÚ´óÊý¾ÝÁ¿µÄ¼ìË÷

Ê×Ò³ ÉÏÒ»Ò³ 1 2 ÏÂÒ»Ò³ βҳ 1/2/2
¡¾´ó ÖРС¡¿¡¾´òÓ¡¡¿ ¡¾·±Ìå¡¿¡¾Í¶¸å¡¿¡¾Êղء¿ ¡¾ÍƼö¡¿¡¾¾Ù±¨¡¿¡¾ÆÀÂÛ¡¿ ¡¾¹Ø±Õ¡¿ ¡¾·µ»Ø¶¥²¿¡¿
·ÖÏíµ½: 
ÉÏһƪ£º¡¾×Ô¿¼¡¿Êý¾Ý¿âϵͳԭÀí£¨Èý£©¨D¨.. ÏÂһƪ£ºUNIXϵͳ»·¾³ÏÂÉèÖÃ×Ô¶¯¿ª¹ØÊý¾Ý..

ÆÀÂÛ

ÕÊ¡¡¡¡ºÅ: ÃÜÂë: (ÐÂÓû§×¢²á)
Ñé Ö¤ Âë:
±í¡¡¡¡Çé:
ÄÚ¡¡¡¡ÈÝ: