{"id":304,"date":"2014-07-22T11:18:40","date_gmt":"2014-07-22T03:18:40","guid":{"rendered":"http:\/\/www.ilezhizhe.com\/?p=304"},"modified":"2014-07-22T20:46:03","modified_gmt":"2014-07-22T12:46:03","slug":"effectivexml-4","status":"publish","type":"post","link":"http:\/\/www.ilezhi.cn\/?p=304","title":{"rendered":"\u300a\u9ad8\u6548 XML\u300b\u7b2c4\u90e8\u5206 Let me project this (Xml file) for you"},"content":{"rendered":"<p>\u6765\u6e90\uff1a<a title=\"http:\/\/blogs.msdn.com\/b\/xmlteam\/archive\/2011\/09\/26\/effective-xml-part-4-let-me-project-this-xml-file-for-you.aspx\" href=\"http:\/\/blogs.msdn.com\/b\/xmlteam\/archive\/2011\/09\/26\/effective-xml-part-4-let-me-project-this-xml-file-for-you.aspx\" target=\"_blank\">http:\/\/blogs.msdn.com\/b\/xmlteam\/archive\/2011\/09\/26\/effective-xml-part-4-let-me-project-this-xml-file-for-you.aspx<\/a><\/p>\n<p style=\"color: #424242;\">Xml is ubiquitous. No doubt about it. It is being used almost everywhere and almost by everyone. This includes places where huge amounts of data are being processed. This means xml files (or streams) used there are also huge. And the bigger the Xml file the harder it is to process. The two biggest problems are:<\/p>\n<ul style=\"color: #424242;\">\n<li>You need to query the document with a couple of XPath expressions or transform it with an Xslt file but the document is too big to be even loaded (the rule of thumb is that an Xml document needs up to 5 times memory of its size on the disk). When you try to load the document you get OutOfMemoryException and that\u2019s about where your Xml processing ends.<\/li>\n<li>You are able to load the document but all the queries or transformations are sloooow (and I assume it\u2019s not because the queries or Xslt stylesheets are poorly written \u2013 if you are not sure see Effective Xml Part 3)<\/li>\n<\/ul>\n<p style=\"color: #424242;\"><!--more-->These are problems indeed but there is a good chance they are solvable. First, take a look at the structure of the Xml in the source Xml. Then look at the XPath expressions or Xslt stylesheet. How much information from the source Xml are you actually using? Probably the bigger the file is and the more complex its structure the less data you are actually using. So, if you don\u2019t actually use some data what\u2019s the point of even trying loading it? Filter this data out. You can do it in a streaming fashion. Instead of using the XmlReader from the Xml API implement your own which will report the stuff you really need and ignore all you don\u2019t really need (i.e. project). Depending on how much you need you can save a lot. Now you document can fit in the memory and the queries or transformations will be faster \u2013 they don\u2019t need process nodes or attributes that are never being used. If you don\u2019t feel like writing your own reader you can try using XPathReader<a style=\"color: #707070;\" href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms950778.aspx\">http:\/\/msdn.microsoft.com\/en-us\/library\/ms950778.aspx<\/a>?(note that the article is aged and may be using some old APIs but the basic idea is the same).<\/p>\n<p style=\"color: #424242;\">If the above steps don\u2019t help you may try splitting your one big task to a few smaller tasks you can run sequentially. Doing this will hopefully enable you to achieve what your goal.<\/p>\n<p style=\"color: #424242;\">Pawel Kadluczka<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u6765\u6e90\uff1ahttp:\/\/blogs.msdn.com\/b\/xmlteam\/archive\/2011\/09\/26\/e &hellip; <a href=\"http:\/\/www.ilezhi.cn\/?p=304\" class=\"more-link\">\u7ee7\u7eed\u9605\u8bfb<span class=\"screen-reader-text\">\u300a\u9ad8\u6548 XML\u300b\u7b2c4\u90e8\u5206 Let me project this (Xml file) for you<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,3],"tags":[60,24,61],"class_list":["post-304","post","type-post","status-publish","format-standard","hentry","category-share","category-program","tag-linq","tag-vb-net","tag-xml"],"_links":{"self":[{"href":"http:\/\/www.ilezhi.cn\/index.php?rest_route=\/wp\/v2\/posts\/304","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.ilezhi.cn\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.ilezhi.cn\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.ilezhi.cn\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.ilezhi.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=304"}],"version-history":[{"count":0,"href":"http:\/\/www.ilezhi.cn\/index.php?rest_route=\/wp\/v2\/posts\/304\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.ilezhi.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=304"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.ilezhi.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=304"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.ilezhi.cn\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=304"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}