设为首页 加入收藏

TOP

C++版CSV解析器
2011-10-31 10:19:15 来源: 作者: 【 】 浏览:715
Tags:CSV 解析

来自:http://www.zedwood.com/article/112/cpp-csv-parser

C++(www.cppentry.com) CSV Parser [C++(www.cppentry.com)版CSV解析器]

 

The function featured here is of course csvline_populate. It parses a line of data by a delimiter. If you pass in a comma as your delimiter it will parse out a Comma Separated Value (CSV) file. If you pass in a '\t' char it will parse out a tab delimited file (.txt or .tsv). CSV files often have commas in the actual data, but accounts for this by surrounding the data in quotes. This also means the quotes need to be parsed out, this function accounts for that as well. 

该函数名叫 csvline_populate,它能够通过分割符来解析一行数据。如果你给它的分隔符参数是 逗号,那么它就正在解析 CSV 了。如果你传入 '\t' 制表符,那么它就正在解析制表符分割的 txt 或 tsv 文件了。
CSV文件通常以逗号分割,但有时也会在数据两边加上引号来包裹数据,这就意味着 引号也需要解析,我们这个函数可以搞定这些。


It would make some sense to only pass in the line and delimiter to the function, and have the return type be the vector. However in terms of performance under heavy loads, this makes less sense. 

这个函数在高负荷下可能意义不大。

Passing in a predefined vector allows a function to populate it, copying bytes from your line as it goes. However, to not pass in a predefined vector, we'd have to declare it as a local variable to the function, which declares it on the stack. This means when the function completes, the variable will be deallocated, so when it returns the vector, the return keyword uses the copy constructor of the vector (which uses the copy constructor of each string object) to assign the return type to the variable in the caller function. To copy the return value of an object (depending on the size of your vector and the number of times you do it), can be an expensive operation. 

#include <iostream>
#include <fstream>
#include <string>
#include <vector>

using namespace std;

void csvline_populate(vector<string> &record, const string& line, char delimiter);

int main(int argc, char *argv[])
{
vector<string> row;
string line;
ifstream in("input.csv");
if (in.fail())  { cout << "File not found" <<endl; return 0}

while(getline(in, line)  && in.good() )
{
csvline_populate(row, line, ',');
for(int i=0, leng=row.size(); i<leng; i++)
cout << row[i] << "\t";
cout << endl;
}
in.close();
return 0;
}

void csvline_populate(vector<string> &record, const string& line, char delimiter)
{
int linepos=0;
int inquotes=false;
char c;
int i;
int linemax=line.length();
string curstring;
record.clear();

while(line[linepos]!=0 && linepos < linemax)
{

= line[linepos];

if (!inquotes && curstring.length()==0 && c=='"')
{
//beginquotechar
inquotes=true;
}
else if (inquotes && c=='"')
{
//quotechar
if ( (linepos+1 <linemax) && (line[linepos+1]=='"') ) 
{
//encountered 2 double quotes in a row (resolves to 1 double quote)
curstring.push_back(c);
linepos++;
}
else
{
//endquotechar
inquotes=false; 
}
}
else if (!inquotes && c==delimiter)
{
//end of field
record.push_back( curstring );
curstring="";
}
else if (!inquotes && (c=='\r' || c=='\n') )
{
record.push_back( curstring );
return;
}
else
{
curstring.push_back(c);
}
linepos++;
}
record.push_back( curstring );
return;
}


Note: 
The only other problem with this code here, is that the definition of a csv allows for the newline character '\n' to be part of a csv field if the field is surrounded by quotes. The csvline_populate function takes care of this properly, but the main function which calls csvline_populate doesn't handle it. The way the main function works is that the getline(in, line) function populates the next line of the file from the 'in' stream using '\n' as a delimiter. So, if there is a '\n' in the middle of the csv field, the getline(in,line) still treats it as an end of line character. Most CSV files do not have a \n in the middle of the field, so it is usually not worth worrying about. 

If one were to fix this issue, it would be by making a fgetcsv type function which operates directly on the FILE* handle or the ifstream. The function would also have to detect EOF (end of file).
 
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
分享到: 
上一篇linux多线程的总结(pthread用法) 下一篇快速排序

评论

帐  号: 密码: (新用户注册)
验 证 码:
表  情:
内  容: