用Asprise的OCR包,处理验证码 (二)

2014-11-24 11:03:50 · 作者: · 浏览: 7
ln("-----------------------------------------------------------");
for(int i=1; i System.out.println(args[i]);
System.out.println("-----------------------------------------------------------/n");
}

File file = new File(args[0]);

System.out.println("Trying to perform OCR on image: " + file.getAbsolutePath());

//OCR.setLibraryPath("E:/Twain/OCR/OCR+i/Release/AspriseOCR.dll");
BufferedImage image =
ImageIO.read(file);
String s = new OCR().recognizeEverything(image);
System.out.println("/n---- RESULTS: ------- /n" + s);
}

}
/*
* $Id$
*
*/
package com.asprise.util.ocr.demo;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import javax.imageio.ImageIO;
import com.asprise.util.ocr.OCR;
public class Demo {
public static void main(String[] args) throws IOException {
if(("1.4").compareTo(System.getProperty("java.vm.version")) > 0) {
System.err.println("Warining: /n/nYou need Java version 1.4 or above for ImageIO to run this demo.");
System.err.println("Your current Java version is: " + System.getProperty("java.vm.version"));
System.err.println("/nSolutions: /n");
System.err.println("(1) Download JRE/JDK version 1.4 or above; OR /n");
System.err.println("(2) Run DemoUI, which can run on your current Java virtual machine.");
System.err.println(" Double click the 'runDemoUI' to invoke it./n");
return;
}
System.out.println("Welcome to Asprise OCR v4.0 Demo!/n");
if(args.length < 1) {
System.err.println("Usage: java Demo PATH_TO_IMAGE [Description]");
return;
}

if(args.length >= 2) {
System.out.println("-----------------------------------------------------------");
for(int i=1; i System.out.println(args[i]);
System.out.println("-----------------------------------------------------------/n");
}

File file = new File(args[0]);

System.out.println("Trying to perform OCR on image: " + file.getAbsolutePath());

//OCR.setLibraryPath("E:/Twain/OCR/OCR+i/Release/AspriseOCR.dll");
BufferedImage image =
ImageIO.read(file);
String s = new OCR().recognizeEverything(image);
System.out.println("/n---- RESULTS: ------- /n" + s);
}

}


用ImageIO的read方法从File读入BufferedImage,然后把Image传个OCR类的recognizeEverything方法,这个方法会返回分析出来的字符串(英文和数字)。

核心的方法就是这些了,但是用它来处理大多数网站的验证码,都不太好使。

原因很简单,大多数网站的验证码都加入不同程度的噪音,以防止OCR软件的自动分析。

所谓的噪音就是,加入背景颜色,或者加入杂七杂八的点,或者加入横七竖八的线,然后就是扭曲文字等等。

那么怎么才能去掉这些噪音,从而让OCR可读呢?

这里以我以前做开心网外挂时用到的程序片段来做例子,给出点思路。

这个程序现在已经不能用了,因为开心网现在的验证码是由汉字组成的了,Asprise只能识别英文和数字。


BufferedImage image;
image = ImageIO.read(new ByteArrayInputStream(buffer));
int width = image.getTileWidth();
int height = image.getTileHeight();
[java] BufferedImage image;
image = ImageIO.read(new ByteArrayInputStream(buffer));
int width = image.getTileWidth();
int height = image.getTileHeight();
BufferedImage image;
image = ImageIO.read(new ByteArrayInputStream(buffer));
int width = image.getTileWidth();
int height = image.getTileHeight();

首先,看一下上面的代码,其中 buffer是一个 byte[],它可以是用File打开的二进制文件,用InputStream的read方法读出的byte