Oracle正则表达式实战 - 数据库编程

n a source string, how do we split it up into separatecolumns, based on changes of case and alpha-to-numeric, such that this.

给定一个元字符串，要求按照指定规则（基于字母大小写和字母到数字的变化）分割为多个列：

ArtADB1234567e9876540

Becomes this. 分割后：

Art ADB 1234567 e 9876540

The source data is set up like this. 元数据如下：

DROP TABLE t1;

CREATE TABLE t1 (

  data VARCHAR2(50)

);

INSERT INTO t1 VALUES ('ArtADB1234567e9876540');

COMMIT;

The first part of the string is an initcap word, so it starts witha capital letter between "A" and "Z". We identify a singlecharacter using the "[]" operator, and ranges are represented using"-", like "A-Z", "a-z" or "0-9". So ifwe are looking for a single character that is a capital letter, we need to lookfor "[A-Z]". That needs to be followed by lower case letters, whichwe now know is "[a-z]", but we need 1 or more of them, which issignified by the "+" operator. So to find an initcap word, we need tosearch for "[A-Z][a-z]+". Since we want the first occurrence of this,we can use the following.

字符串第一部分为大写字母，可能为A-Z。我们使用[]操作符识别单个字符，至于范围则用“-”，例如“A-Z”，"a-z"或"0-9"。所以如果我们需要找大写的首字母则用“[A-Z]”。其后紧跟着的是若干小写字母，可以用+表示若干（1个或多个）。组合起来的正则表达式即为：[A-Z][a-z]+，这样拆分出的第一列方法有了。

REGEXP_SUBSTR(data, '[A-Z][a-z]+', 1, 1)

The second part of the string is a group of 1 or more uppercaseletters. We know we need to use the "[A-Z]+" pattern, but we need tomake sure we don't get the first capital letter, so we look for the secondoccurrence.

第二部分是一组包含1个或多个大写字母。我们知道需要用模式：[A-Z]+,但是为了不和第一部分冲突，我们指明匹配其第2次出现的文本。

REGEXP_SUBSTR(data, '[A-Z]+', 1, 2)

The next part is the first occurrence of a group of numbers.

下一部分是一组纯数字。

REGEXP_SUBSTR(data, '[0-9]+', 1, 1)

The next part is a group of lower case letters. We don't to pickup those from the initcap word, so we must look for the second occurrence oflower case letters.

下一部分是一组小写字母，同样考虑了不和第一部分冲突：

REGEXP_SUBSTR(data, '[a-z]+', 1, 2)

Finally, we have a group of numbers, which is the secondoccurrence of this pattern.

最后，是一组数字：

REGEXP_SUBSTR(data, '[0-9]+', 1, 2)

Putting that all together, we have the following query, whichsplits the data into separate columns.

将以上每一部分正则表达式的输出分别作为独立字段：

COLUMN col1 FORMAT A15

COLUMN col2 FORMAT A15

COLUMN col3 FORMAT A15

COLUMN col4 FORMAT A15

COLUMN col5 FORMAT A15

SELECT REGEXP_SUBSTR(data, '[A-Z][a-z]+', 1, 1) col1,

       REGEXP_SUBSTR(data, '[A-Z]+', 1, 2) col2,

       REGEXP_SUBSTR(data, '[0-9]+', 1, 1) col3,

       REGEXP_SUBSTR(data, '[a-z]+', 1, 2) col4,

       REGEXP_SUBSTR(data, '[0-9]+', 1, 2) col5

FROM   t1;

COL1          COL2          COL3            COL4        COL5

---------   ----------    ----------    -----------   ------------

Art           ADB          1234567         e             9876540

1 row selected.

SQL>

Example 3 : REGEXP_SUBSTR

We need to pull out a group of characters from a "/"delimited string, optionally enclosed by double quotes. The data looks likethis.

我们需要从一个字符串（含有分隔字符/和双引号” ”）中提取一组字符，原始数据如下：

DROP TABLE t1;

CREATE TABLE t1 (

  data VARCHAR2(50)

);

INSERT INTO t1 VALUES ('978/955086/GZ120804/10-FEB-12');

INSERT INTO t1 VALUES ('97/95508/BANANA/10-FEB-12');

INSERT INTO t1 VALUES ('97/95508/"APPLE"/10-FEB-12');

COMMIT;

We are looking for 1 or more characters that are not"/", which we do using "[^/]+". The "^" in thebrackets represents NOT and "+" means 1 or more. We also want toremove optional double quotes, so we add that as a character we don't want,giving us "[^/"]+". So if we want the data from the thirdcolumn, we need the third occurrence of t

Oracle正则表达式实战(二)

Example 3 : REGEXP_SUBSTR