Embed XMP Metadata in a PDF Document
- 7 minutes to read
Adobe Extensible Metadata Platform (XMP) is an XML-based ISO metadata standard, originally created by Adobe Systems Inc. It defines the data structure, serialization model, and basic metadata properties intended to form a unified metadata package that can be embedded into different media formats.
PDF Document API allows you to embed XMP metadata in your documents. You can load metadata from a stream or a string, edit existing metadata or generate a new XMP data model.
Add New Metadata
The PdfDocument.SetMetadata method allows you to embed XMP metadata in the document. You can pass a string with metadata or an XmpDocument object to this method.
The XmpDocument object is an instance of the XMP data model (XMP packet). You can load data to the packet from a stream or a string.
The code sample below loads metadata from a file and embeds it in the document:
using DevExpress.Pdf;
using DevExpress.Pdf.Xmp;
//...
using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
PdfDocument document = pdfDocumentProcessor.Document;
XmpDocument metadata = new XmpDocument();
using (FileStream xmlStream = new FileStream("Documents//metadata.xml", FileMode.Open, FileAccess.Read))
{
metadata = XmpDocument.FromStream(xmlStream);
document.SetMetadata(metadata);
}
pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf");
}
Access Document Metadata
You can obtain metadata associated with a document, page or Form XObject. The PdfMetadata.Data property retrieves the object’s metadata. Use the XmpDocument.FromString method to convert the retrieved data to an XMP packet, as shown in the example below.
using DevExpress.Pdf;
using DevExpress.Pdf.Xmp;
//...
using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
PdfDocument document = pdfDocumentProcessor.Document;
string metadata = document.Metadata.Data;
XmpDocument xmpDocument = XmpDocument.FromString(metadata);
}
Manage Metadata Nodes
The XmpDocument.Values property returns the dictionary that contains packet nodes (name-value pairs for metadata properties). You can access a packet node by its name or value. A node name has the prefix:local name format.
You can add new nodes and change an existing node’s value. The table below lists available node value types and API used to create and access each node type.
Value type | Description | Created By | Retrieved By |
---|---|---|---|
Simple | A Unicode string. The string may be empty. A simple value can be a regular string or URI string. | XmpDocument.Add XmpDocument.CreateSimpleValue |
XmpDocument.GetSimpleValue XmpDocument.GetBoolean XmpDocument.GetDate XmpDocument.GetString XmpDocument.GetFloat XmpDocument.GetInteger |
Array | A container for items of any available value type, including a nested array and structure. | XmpDocument.CreateArray | XmpDocument.GetArray |
Structured value | A container for fields with unique names. Field values can have any available type. | XmpDocument.CreateStructure | XmpDocument.GetStructure |
When you add a new node to the packet, you can use an XmpName class object or a string to specify the node name. In the latter case, make sure that the specified prefix is registered. You can call the XmpDocument.RegisterNamespace method to register the prefix.
The code sample below edits document metadata:
using DevExpress.Pdf;
using DevExpress.Pdf.Xmp;
//...
using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
// Load a document
pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
PdfDocument document = pdfDocumentProcessor.Document;
// Retrieve metadata:
XmpDocument metadata = XmpDocument.FromString(document.Metadata.Data);
// Add items to the Creator array:
XmpArray creators = metadata.GetArray("dc:creator");
if (creators != null)
{
creators.Add("PDF Document API");
creators.Add("Office File API");
}
// Change the CreatorTool node value:
XmpSimpleNode creatorTool = metadata.GetSimpleValue("xmp:CreatorTool");
creatorTool.SetValue("PDF Document API");
// Add MaxPageSize structure:
XmpName structureName = XmpName.Get("MaxPageSize", "http://ns.adobe.com/xap/1.0/t/pg/");
XmpStructure dimensions = metadata.CreateStructure(structureName);
metadata.RegisterNamespace("http://ns.adobe.com/xap/1.0/sType/Dimensions#", "stDim");
dimensions.Add("stDim:h", 11);
dimensions.Add("stDim:w", 8.5f);
dimensions.Add("stDim:Unit", "inch");
// Embed modified metadata in the document:
document.SetMetadata(metadata);
// Save the result:
pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf");
}
Use XMP Schemas
An XMP schema (or namespace) is a set of metadata properties. Each schema is identified by a unique namespace URI and can hold an arbitrary number of properties. The XMP specification contains a definition of predefined schemas that include standard general-purpose namespaces, and namespaces that are specialized for Adobe applications.
PDF Document API supports the following predefined XMP namespaces:
Namespace | Description | Credentials | Class |
---|---|---|---|
Basic XMP | Contains basic description information. | Namespace URI: http://ns.adobe.com/xap/1.0/ Prefix: xmp |
XmpProperties |
Adobe PDF | Specifies properties used in Adobe PDF documents. | Namespace URI: http://ns.adobe.com/pdf/1.3/ Prefix: pdf |
AdobePdfProperties |
PDF/A | Used to define a document’s PDF/A conformance level and version. | Namespace URI: https://www.aiim.org/pdfa/ns/id Prefix: pdfaid |
PdfAProperties |
Dublin Core | Contains information defined in the Dublin Core Metadata Set, created by the Dublin Core Metadata Initiative (DCMI). | Namespace URI: http://purl.org/dc/elements/1.1/_ Prefix: dc |
DublinCoreProperties |
Rights Management | Contains information regarding the legal restrictions associated with a PDF document. | Namespace URI: http://ns.adobe.com/xap/1.0/rights/ Prefix: xmpRights |
XmpRightsManagementProperties |
The code sample below adds items from the Rights Management schema to the packet:
using DevExpress.Pdf;
using DevExpress.Pdf.Xmp;
//...
using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
// Load a document:
pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
PdfDocument document = pdfDocumentProcessor.Document;
// Create a new XMP packet:
XmpDocument metadata = new XmpDocument();
XmpRightsManagementProperties rightsManagementSchema =
metadata.RightsManagementProperties;
rightsManagementSchema.Certificate = "https://www.devexpress.com/";
rightsManagementSchema.Owner.Add("DevExpress");
rightsManagementSchema.Marked = true;
rightsManagementSchema.WebStatement = "https://www.devexpress.com/support/eulas/";
rightsManagementSchema.UsageTerms.AddString("Copyright(C) 2021 DevExpress.All Rights Reserved.",
"x-default");
// Embed metadata in the document:
document.SetMetadata(metadata);
// Save the result:
pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf");
}
Create Custom Schema
Create a CustomProperties class object and fill it with items to create a custom schema. Assign this object to the XmpDocument.CustomProperties property to add your schema to the packet.
using DevExpress.Pdf;
using DevExpress.Pdf.Xmp;
//...
using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
PdfDocument document = pdfDocumentProcessor.Document;
XmpDocument metadata = XmpDocument.FromString(document.Metadata.Data);
metadata.RegisterNamespace("https://www.devexpress.com/", "dx");
CustomProperties customProperties = new CustomProperties(metadata, "https://www.devexpress.com/");
customProperties["Team"] = "Office";
customProperties["Checked"] = "true";
customProperties["Project"] = "PDF Document API";
document.SetMetadata(metadata);
pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf");
}
Remove Metadata
Call the XmpDocument.Remove method to remove an XMP node with the specified name.
The code sample below removes the dc:Title node:
using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
// Load a document
pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
PdfDocument document = pdfDocumentProcessor.Document;
// Retrieve metadata:
XmpDocument metadata = XmpDocument.FromString(document.Metadata.Data);
// Delete the node:
metadata.Remove("dc:title");
// Apply changes:
document.SetMetadata(metadata);
// Save the result:
pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf");
}
Note
The PdfDocumentProcessor.SaveDocument method call always writes the following metadata nodes:
- xmp:CreateDate
- xmp:ModifyDate
- xmp:MetadataDate
Set the PdfSaveOptions.DisableCreationDateUpdate property to false
and pass the PdfSaveOptions object as the SaveDocument
method parameter to disable the xmp:CreateDate node update.
The PdfSaveOptions.DisableModDateUpdate property allows you to disable only the xmp:ModifyDate node update. Use the PdfSaveOptions.DisableMetadataUpdate property to disable all mandatory metadata nodes updates.