ÎÒ×î½üÔÚ´¦ÀíWiki°Ù¿ÆµÄһЩXMLÎļþ£¬ÓÐһЩ·Ç³£´óµÄXMLÎļþ£¬ÀýÈç×îеÄÐÞ¶©°æÎļþʱ36G£¨Î´Ñ¹Ëõ£©¡£¹ØÓÚ½âÎöXML£¬ÎÒÔø¾ÔÚ¼¸ÖÖÓïÑÔÖÐ×ö¹ýʵÑ飬×îÖÕÎÒ·¢ÏÖGo·Ç³£µÄÊʺϡ£
GoÓµÓÐÒ»¸öͨÓõĽâÎöXMLµÄ¿â£¬Ò²Äܷܺ½±ãµÄ±àÂë¡£Ò»¸ö±È½Ï¼òµ¥µÄ´¦ÀíXMLµÄ°ì·¨ÊÇÒ»´ÎÐÔ½«Îĵµ½âÎö¼ÓÔØµ½ÄÚ´æÖУ¬È»¶øÕâÖа췢¶ÔÓÚÒ»¸ö36GµÄ¶«Î÷À´½²ÊDz»¿ÉÐеġ£
ÎÒÃÇÒ²¿ÉÒÔ²ÉÓÃÁ÷µÄ·½Ê½½âÎö£¬µ«ÊÇһЩÔÚÏßµÄÀý×ӱȽϼòµ¥¶øÈ±·¦£¬ÕâÀïÊÇÎҵĽâÎöwiki°Ù¿ÆµÄʾÀý´úÂë¡£(full example code at https://github.com/dps/go-xml-parse/blob/master/go-xml-parse.go)
ÕâÀïÓÐÆäÖеÄά»ùxmlƬ¶Î¡£
//
// Apollo 11
//
// ...
//
// ...
//
// {{Infobox Space mission
// |mission_name=<!--See above-->
// |insignia=Apollo_11_insignia.png
// ...
//
//
//
ÔÚÎÒÃǵÄGo´úÂëÖУ¬ÎÒÃǶ¨ÒåÁËÒ»¸ö½á¹¹Ì壨struct£©À´Æ¥ÅäÔªËØ¡£
type Redirect struct {
Title string `xml:"title,attr"`
}
type Page struct {
Title string `xml:"title"`
Redir Redirect `xml:"redirect"`
Text string `xml:"revision>text"`
}
ÏÖÔÚÎÒÃǸæËß½âÎöÆ÷wikipediaÎĵµ°üÀ¨Ò»Ð©²¢ÇÒÊÔ×ŶÁÈ¡Îĵµ£¬ÕâÀïÈÃÎÒÃÇ¿´¿´ËûÈçºÎÒÔÁ÷µÄ·½Ê½¹¤×÷¡£ÆäʵÕâÊǷdz£¼òµ¥µÄ£¬Èç¹ûÄãÁ˽âÔÀíµÄ»°--±éÀúÎļþÖеıêÇ©£¬Óöµ½±êÇ©µÄstartElement£¬È»ºóʹÓÃÉñÆæµÄ decoder.DecodeElement API½â×éΪÕû¸ö¶ÔÏó£¬È»ºó¿ªÊ¼ÏÂÒ»¸ö¡£
decoder := xml.NewDecoder(xmlFile)
for {
// Read tokens from the XML document in a stream.
t, _ := decoder.Token()
if t == nil {
break
}
// Inspect the type of the token just read.
switch se := t.(type) {
case xml.StartElement:
// If we just read a StartElement token
// ...and its name is "page"
if se.Name.Local == "page" {
var p Page
// decode a whole chunk of following XML into the
// variable p which is a Page (se above)
decoder.DecodeElement(&p, &se)
// Do some stuff with the page.
p.Title = CanonicalizeTitle(p.Title)
...
}
...
ÎÒÏ£ÍûÔÚÄãÐèÒª×Ô¼º½âÎöÒ»¸ö´óµÄXMLÎļþµÄʱºò£¬ÕâЩÄܽÚÊ¡ÄãһЩʱ¼ä¡£