匹配括号内文本的模式基本写法为:“\(.*\)”。\是转义字符,使跟在其后的字符变为字面值。但是这个模式用在REGEXP_SUBSTR时会连括号一起返回。为了忽略括号我们需要在字面括号内部包含子表达式:"\((.*)\)".
COLUMN with_parentheses FORMAT A20
COLUMN without_parentheses FORMAT A20
SELECT data,
REGEXP_SUBSTR(data, '\(.*\)') AS with_parentheses,
REGEXP_SUBSTR(data, '\((.*)\)', 1, 1, 'i', 1) AS without_parentheses
FROM t1
WHERE REGEXP_LIKE(data, '\(.*\)');
DATA WITH_PARENTHESES WITHOUT_PARENTHESES
-------------------------------------------------- -------------------- --------------------
This is some text (with parentheses) in it. (with parentheses) with parentheses
This text has (parentheses too). (parentheses too) parentheses too
2 rows selected.
SQL>
?
注意:REGEXP_SUBSTR(data,'\((.*)\)', 1, 1, 'i', 1) 中最后的i代码不区分大小写,最后1个“1”代表返回哪个子表达式匹配的文本。(范围0-9)
Example 7 : REGEXP_COUNT
We need to know how many times a block of 4 digits appears intext. The data looks like this.
我们需要知道4个数字的块在字符串中出现的次数。看原始数据:
DROP TABLE t1;
CREATE TABLE t1 (
data VARCHAR2(50)
);
INSERT INTO t1 VALUES ('1234');
INSERT INTO t1 VALUES ('1234 1234');
INSERT INTO t1 VALUES ('1234 1234 1234');
COMMIT;
We can identify digits using "\d" or "[0-9]"and the "{4}" operator signifies 4 of them, so using"\d{4}" or "[0-9]{4}" with the REGEXP_COUNT functionseems to be a valid option.
我们可以用表达式:\d 或[0-9]和{4}操作符识别4个数字的块。
SELECT REGEXP_COUNT(data, '[0-9]{4}') AS pattern_count_1,
REGEXP_COUNT(data, '\d{4}') AS pattern_count_2
FROM t1;
PATTERN_COUNT_1 PATTERN_COUNT_2
--------------- ---------------
1 1
2 2
3 3
3 rows selected.
SQL>
Example 8 : REGEXP_LIKE
We need to identify invalid email addresses. The data looks likethis.
我们需要校验邮箱地址,原始数据如下:
DROP TABLE t1;
CREATE TABLE t1 (
data VARCHAR2(50)
);
INSERT INTO t1 VALUES ('me@example.com');
INSERT INTO t1 VALUES ('me@example');
INSERT INTO t1 VALUES ('@example.com');
INSERT INTO t1 VALUES ('me.me@example.com');
INSERT INTO t1 VALUES ('me.me@ example.com');
INSERT INTO t1 VALUES ('me.me@example-example.com');
COMMIT;
The following test gives us email addresses that approximate toinvalid email address formats.
下列测试给我们近似不合法的邮箱。
SELECT data
FROM t1
WHERE NOT REGEXP_LIKE(data, '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}', 'i');
DATA
--------------------------------------------------
me@example
@example.com
me.me@ example.com
3 rows selected.
SQL>
?
-----------------------------
Dylan Presents.