使用boost::property_tree解析xml与json (一):概述

文章目录

1. boost::property_tree
2. 解析xml的接口
3. 解析json接口
1. 3.1. get数组
2. 3.2. put数组

最近我正在开始研究消息队列，MQueue，今天在阅读我以前的代码DBPool 和 SocketPool的时候，看到我以前在DBPool 和 SocketPool的池的构造函数当中使用boost的property_tree来解析xml和json的代码非常冗长，今天花了一天的时间把解析xml和json的接口实现了，名字为parse_xml.cpp和parse_json.cpp，放到MQueue里面了，并同步更新到了DBPool 和 SocketPool。

老规矩，还是先讲解一下boost的property_tree。我看的是《Boost程序库完全开发指南深入准标准库》。

boost::property_tree

property_tree是一个保存了多个属性值的树形数据结构，可以使用类似路径的简单方式访问任意节点的属性，而且每个节点都可以用类似STL的风格遍历子节点。property_tree可以处理xml,json,ini,info四种格式的文本数据。

使用property_tree之前要包含头文件

1	#include <boost/property_tree/ptree.hpp>

解析xml的接口

property_tree本身并没有实现xml解析器，使用的是rapidxml这个开源项目，它比市面上能够找到的大部分xml解析器都要快。这个rapidxml非常有名，以至于rapidjson 这个开源json解析库连名字也是借鉴它的，但是boost并没有整合rapidjson。但是我还是非常推荐rapidjson的，毕竟是miloyip的作品。

在解析xml之前需要包含头文件

1	#include <boost/property_tree/xml_parser.hpp>

boost::property_tree::read_xml

1
2
3

template<typename Ptree> 
  void read_xml(std::basic_istream< typename Ptree::key_type::value_type > & stream, 
                Ptree & pt, int flags = 0);

这个接口的使用方法是这样:

1
2
3

sting configPath_ = "../config/config.xml";
boost::property_tree::ptree pt;
boost::property_tree::read_xml(configPath_, pt);

然后就把../config/config.xml的这个路径下的config.xml读入到了pt这个数据结构当中了。

get

<root>
  <child>
   <a>“1”</a>
   <b>2</b>
 </child>
</root>

直接使用get即可获取xml的标签值，比如

1 2	pt.get<string>("root.child.a"); pt.get<int>("root.child.b");

property使用点号.来作为路径分隔符号。root.child.a就是获取root标签下child标签的a标签的值。
string是指定将获取的值的类型。

get_child

get_child用来获取子节点的对象。
比如说在

<root>
   <child>
    <a>“1”</a>
    <b>2</b>
  </child>
 </root>

当中，当我需要遍历child下所有的标签的时候（这在我不知道child标签下有几个子标签的时候经常用），我就不能单纯地使用get了。

auto child = pt_->get_child("root.child");
  for (auto pos = child.begin(); pos!= child.end(); ++pos) {
    cout << pos->first << "---" << pos->second.data();
  }

child.begin()得到的是boost::property_tree::ptree::iterator类型的对象，是个迭代器。
注意节点的数据结构是一个pair,pair的first是节点的标签名，second是节点自身。那么对root.child进行遍历就得到了child的子节点，子节点的first是标签名，在这里也就是a和b，second是子节点自身，对second调用data()方法就能访问值。

attribute

xml和json有个最大的不同点，就是xml有attribute,而json有数组的概念。所以parse_xml.cpp和parse_json.cpp对外接口并不一样。

property_tree将所有的xml节点转换为属性树对应的节点，每一个节点的类型都是value_type。节点的attribute保存在改节点的<xmlattr>下级节点，注释保存在<xmlcomment>下级节点。这样说可能比较难以理解，我们来看个例子：

<root>
  <!-- root comment -->
   <child>
    <!-- child comment -->
    <a hello="style">“1”</a>
    <b>2</b>
  </child>
 </root>

那么

read_xml("conf.xml", pt);
pt.get<string>("root.<xmlcomment>");   //root comment
pt.get<string>("root.child.<xmlcomment>");  //child comment
pt.get<string>("root.child.a.<xmlattr>.hello");  //style

请注意root.child.a.<xmlattr>.hello，不是直接使用<xmlattr>就可以了，因为一个标签可以有很多个attribute，所以还是需要指定attribute的名字。

boost::property_tree::write_xml

template<typename Ptree> 
  void write_xml(std::basic_ostream< typename Ptree::key_type::value_type > & stream, 
                 const Ptree & pt, 
                 const xml_writer_settings< typename Ptree::key_type::value_type > & settings = xml_writer_settings< typename Ptree::key_type::value_type >());

这个接口的使用方法是这样:

1 2	sting configPath_ = "../config/config.xml"; boost::property_tree::write_xml(configPath_, pt);

把pt这个数据结构当中的信息写入../config/config.xml当中。可以看到property_tree既可以读取数据，又可以写入数据，它的操作具有对称性。

put

和get方法对称的就是put方法。
比如：

<root>
 <child>
    <a></a>
    <b></b>
 </child>
</root>

调用了pt.put("key", "testput")之后：

<root>
  <child>
    <a></a>
    <b></b>
  </child>
</root>
<key>"testput"</key>

用法非常简单，其中key是xml的路径，可以随便替换。

add_child

和get_child相对的就是add_child函数了。比如我需要一口气插入很多标签，那么就要用到add_child了。

<root>
  <child>
    <a>"1"</a>
    <b>2</b>
  </child>
</root>

在执行了

read_xml("config.xml", pt);
ptree child;
child.put("a", "3");
child.put("b", "4");
pt.add_child("root.newchild", child);
write_xml("config.xml", pt);

之后，就变成了：

<root>
  <child>
    <a>"1"</a>
    <b>2</b>
  </child>
  <newchild>
    <a>"3"</a>
    <b>"4"</b>
  </newchild>
</root>

attribute

根据上面说的attribute其实就是下级标签<xmlattr>之后，直接使用put方法就能够把attribute写入xml文本当中去。
比如说执行：

1	pt.put("error.<xmlattr>.id" "DB_ERROR_EXECUTE")

很简单地就能得到：

1	<error id="DB_ERROR_EXECUTE"/>

解析json接口

上面已经提到过了，json和xml的最大不同就在于json的数组。

get数组

{ 
  "root":
     {
        "child":
         [
           { 
              "a": "1",
               "b": "2" 
           },
           { 
              "a": "1",
              "b": "2" 
           }
         ]
     }
}

可以看到在这个json当中，root.child是一个数组。
获取数组当中的值就需要使用value_type类型：

for (ptree::value_type &v : child.get_child("")) {
  //v 就是child数组当中的元素
  auto nextchild = v.second.get_child("");
  //nextchild 就是元素"a"，“b”的集合
     for (auto pos = nextchild.begin(); pos!= nextchild.end(); ++pos) {
       cout << pos->first << "---" << pos->second.data() << endl;
  }
}

要回答这个问题，还是要回到property_tree这个结构当中去，在property_tree的定义式当中，每一个节点的类型都是value_type，它是一个pair，节点的名字就是first，节点自身就是second。那么在这里，child也是一个value_type，它调用get_child("")就得到了child这个数组的所有成员，当然都是value_type型的了。

put数组

要想put一个数组，就要是用push_back方法。

ptree temp;
temp.put("a", "3");
temp.put("b", "4");
ptree temp_second;
temp_second.put("a", "3");
temp_second.put("b", "4");

ptree myTree;
// 构造一个数组就是要用push_back方法
myTree.push_back(make_pair("", temp);
myTree.push_back(make_pair("", temp_second);

read_xml("config.json", pt);
pt.add_child("root.newchild", myTree);
write_xml("config.json", pt);

因为一个节点的数据类型就是value_type类型，value_type类型不是简单的pair，否则的话也不可能调用push_back，它是经过了特殊处理。

执行结果：

{ 
  "root":
     {
        "child":
         [
           { 
              "a": "1",
               "b": "2" 
           },
           { 
              "a": "1",
              "b": "2" 
           }
         ],
        "newchild":
         [
           { 
              "a": "3",
               "b": "4" 
           },
           { 
              "a": "3",
              "b": "4" 
           }
         ],         
     }
}

Adair's Home

书写|为了更好地思考

使用boost::property_tree解析xml与json (一):概述

boost::property_tree

解析xml的接口

boost::property_tree::read_xml

get

get_child

attribute

boost::property_tree::write_xml

put

add_child

attribute

解析json接口

get数组

put数组