In this series of posts we will be creating InfoPath form instance (XML) documents programmatically using the SharePoint object model.
In contrast to other articles about this subject, we will extract the goo required directly from the InfoPath form XSN file itself, the correct InfoPath XSN file will be determined by the Content Type associated with the form.
So what goo do we need to create an InfoPath instance document?
- The forms (default) template XML, this an XML document whose structure is inferred by the form schema.
- Form metadata, the values of which are used to create the XML PI’s (processing instructions) which serve 2 purposes;
- Used by SharePoint to determine the content type associated with the InfoPath form, as I discussed in this post.
- Used by InfoPath to determine among other things the form (schema) version and InfoPath client version.
If you open up an InfoPath form XML file you’ll see something like below, note that I’ve prettied it up for display purposes;

How do we get the goo?
It’s all available in the form template XSN file itself, an XSN file is nothing more than a renamed Cabinet (.CAB) format file, the InfoPath designer has an option on the File menu to Save As Source Files;

If you save an InfoPath form as its source files you’ll typically see something like;

The files of interest here are template.xml, which contains a template XML file, and manifest.xsf, which is an XML file containing all the forms metadata.
We can also see the form schema file (.xsd), the sample data file and the XSLT transformations (.xsl) which create the forms Views.
Getting back to the point in hand, knowing that all of the goo we need is in the form XSN file, we need to extract the template.xml and manifest.xsf files from the XSN file; giving us the template XML and by parsing the XSF file we can extract the metadata required, which is available on the root document element, as shown;

Extracting Files from an InfoPath Form XSN File.
What we need is a library which allows us to uncompress (specific, named) files from a CAB format file – google throws up a few possibilities, but the one I’ve used is called “Cabinet File (*.CAB) Compression and Extraction”, its a Managed C++ project and signed, so it’s good for installing into the GAC. Using this project we can write some code to extract files from the XSN file;
public class InfopathFormGrocker { private readonly bool RemovePIs; public InfopathFormGrocker(bool removePIs) { RemovePIs = removePIs; } public XmlDocument ComponentContent { get; private set; } public bool ExtractComponent(Stream formStream, string componentFilename) { if (formStream == null) throw new ArgumentNullException("formStream"); if (string.IsNullOrEmpty(componentFilename)) throw new ArgumentNullException("componentFilename"); ComponentContent = null; // reset the stream formStream.Seek(0, SeekOrigin.Begin); // do the extraction var cabExtractor = new CabLib.Extract(); cabExtractor.SetSingleFile(componentFilename); cabExtractor.evAfterCopyFile += OnAfterCopyFile; cabExtractor.ExtractStream(formStream, "MEMORY"); return ComponentContent != null; } private void OnAfterCopyFile(string fileName, byte[] u8FileContent) { var content = Encoding.ASCII.GetString(u8FileContent); // remove Unicode BOM if present if (content.Length > 1 && content[0] == 0xFEFF) content = content.Substring(1); // load into xmldoc ComponentContent = new XmlDocument(); ComponentContent.LoadXml(content); if (!RemovePIs) return; // remove PIs var piNodes = ComponentContent.SelectNodes("/processing-instruction()"); if (piNodes == null) return; foreach (XmlNode piNode in piNodes) { if (piNode.LocalName == "mso-infoPathSolution" || piNode.LocalName == "mso-application") ComponentContent.RemoveChild(piNode); } } }
For the template xml we can just extract the template.xml file and use it straight off as it is.
For the form metadata, we have to parse the manifest.xsf file and get the following attribute (values) from the root document element;
- name
- solutionVersion
- productVersion
<xsf:xDocumentClass solutionFormatVersion="2.0.0.0" solutionVersion="1.0.0.4" productVersion="12.0.0" requireFullTrust="yes" publishUrl="E:\Development\Projects\Sandbox\Forms\MyReportForm\manifest.xsf" name="urn:schemas-microsoft-com:office:infopath:MyReportForm:-myXSD-2010-07-23T09-09-33" xmlns:xsf="http://schemas.microsoft.com/office/infopath/2003/solutionDefinition" xmlns:xsf2="http://schemas.microsoft.com/office/infopath/2006/solutionDefinition/extensions" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xdUtil="http://schemas.microsoft.com/office/infopath/2003/xslt/Util" xmlns:xdXDocument="http://schemas.microsoft.com/office/infopath/2003/xslt/xDocument" xmlns:xdMath="http://schemas.microsoft.com/office/infopath/2003/xslt/Math" xmlns:xdDate="http://schemas.microsoft.com/office/infopath/2003/xslt/Date" xmlns:xdExtension="http://schemas.microsoft.com/office/infopath/2003/xslt/extension" xmlns:xdEnvironment="http://schemas.microsoft.com/office/infopath/2006/xslt/environment" xmlns:xdUser="http://schemas.microsoft.com/office/infopath/2006/xslt/User" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-07-23T09:09:33">
A Word on Namespaces.
If you’ve done much (or any :-)) work with InfoPath you’ll no doubt be aware that dealing with it’s XML files involves extensive use of XML namespaces, and that in order to query an XmlDocument or XPathNavigator, you need to provide an XmlNamespaceManager instance, here’s a code snippet that creates an XmlNamespaceManager populated with the namespace prefix’s and uri’s found in the document (root document element anyway);
public static XmlNamespaceManager CreateNamespaceManager(XmlDocument document) { if (document == null) throw new ArgumentNullException("document"); if (document.DocumentElement == null) throw new ArgumentNullException("document", "The root document element is null!"); var ns = new XmlNamespaceManager(document.NameTable); foreach (XmlAttribute xatt in document.DocumentElement.Attributes) { var prefixPair = xatt.Name.Split(new[] {':'}, StringSplitOptions.RemoveEmptyEntries); if (prefixPair.Length < 1) continue; if (!prefixPair[0].Equals("xmlns", StringComparison.OrdinalIgnoreCase)) continue; var prefix = prefixPair.Length == 2 ? prefixPair[1] : string.Empty; var uri = xatt.Value; ns.AddNamespace(prefix, uri); } return ns; }
InfoPath Form XML Processing Instructions?
So, in order to create an InfoPath form instance (XML) document, we need the template XML;
<my:myFields xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-07-23T09:09:33" xml:lang="en-US"> <my:HeaderText></my:HeaderText> <my:BodyText></my:BodyText> <my:SummaryText></my:SummaryText> </my:myFields>
and we then need to add the InfoPath specific processing instructions, mso-infoPathSolution and mso-application;
<?mso-infoPathSolution name="urn:schemas-microsoft-com:office:infopath:MyReportForm:-myXSD-2010-07-23T09-09-33" solutionVersion="1.0.0.2" productVersion="12.0.0.0" PIVersion="1.0.0.0" href="http://portal/FormServerTemplates/MyReportForm.xsn"?> <?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
And of course the XML processing instruction itself, note that InfoPath forms are expected to be UTF8 encoded;
<?xml version="1.0" encoding="utf-8"?>
The sharp-eyed amongst you may have noticed the href attribute in the mso-infoPathSolution PI….where’s this from I hear you ask?
This attribute is a key component used by SharePoint in determining what the forms associated content type is, as I discussed in this post. Remember that for the most part InfoPath forms are deployed into SharePoint as content types.
An InfoPath form content type is configured with the form XSN file as the document template (rather than an Office document template), with the template usually being located in the hidden FormServerTemplates folder (which you can see using SharePoint designer). Looking in the content types advanced settings shows this;

This is why, when you click on New -> MyInfoPathForm in a document or form library your form is launched either by FormServer.aspx or the InfoPath client;

So the href attribute value, needs to be set to the document template URL of the InfoPath forms content type, which is accessible using the SPContentType.DocumentTemplateUrl property.
Where are We?
- We can extract files from an InfoPath XSN form
- We know what form metadata we need and where to get it from
- We know how to assemble all these bits to form a valid InfoPath form instance (XML) document
What’s left?
In part2 I’ll show the components of a function which creates an InfoPath form instance document using a Content Type name (or ID) and the XML form body (based on the template XML of course).
Taking the assembled piece of XML, it’s then an simple matter to publish this into a document or form library.
Having done this, you’ve got yourself a bona fide InfoPath form, a first class SharePoint citizen in fact 🙂
Hi Phil,
If I have xml infopath files that have been generated by code rather than by the infopath form itself (using the method you mention above to generate the .VBS file), how do I go about correcting the namespace issues so that I can automatically extract the metadata when the form(xml) is uploaded to the forms library using the same .XSN file as the template?
Regards
Michael
Michael, not sure if I understand you correctly, but the XML form instance document should be a fully qualified XML instance document. In InfoPath terms, the document elements should all be namespace prefixed (e.g. ) with the namespace URI / Prefix being declared, probably on the root document element.
If you create the form instance document correctly, your promoted property columns in SharePoint will work as expected and show the form data in the SharePoint columns.
Let me know if I can help further.
Is there a way that I can find out the name or guid of library on sharepoint site where I am creating a xml document from an InfoPath content type?
When I publish a InfoPath template to a library that document has in its processing instructions the href which gives me the library where published, but when I publish it as content type on a site, and use it in multiple librariers in href I have a folder url where I saved content type. I need guid or href of library where I opened and am to save the xml document based on InfoPath content type. And this infopath content type is client based, not browser-enabled.
Thank you in advance
Hi Phil,
I need to programmatically load a infopath 2010 form submitted to Sharepoint 2010 and then convert those infopath forms to PDF document. Could you point me in the right direction. I’m looking at Infopath Forms Services object model to accomplish this task. Could you please point me in the right direction.
Thanks Phil for nice post.I am trying to use this code and it throwing error can not access closed stream at this line formStream.Seek(0, SeekOrigin.Begin) when try to extract .xsf file and it did work for template.xml file. please advise
Thanks
Ronak
Reblogged this on SharePoint Blog.
Hi Phil,
Can you please share the code to itzikf6@gmail.com
The code is included in the blog post