Skip to main content
You are viewing help content for pre-release software. This document and the features it describes are subject to change.
All docs
V26.1
  • Word Processing Document API: Process Document Content Safely

    • 7 minutes to read

    The Word Processing Document API includes configurable security layers for safe document processing. These layers address document security requirements in regulated industries:

    • GDPR (General Data Protection Regulation) requires the removal of metadata such as author names, internal file paths, and edit history before documents are shared.
    • HIPAA (Health Insurance Portability and Accountability Act) requires protection against malware in documents that contain patient health information and safeguards against PHI (Protected Health Information) leakage through document properties and tracked changes.
    • SOX (Sarbanes-Oxley Act) requires documented controls over financial document integrity. The structured findings list returned by the Sanitize API supports Section 404 audit trail requirements.

    The Safe Document Processing API covers three areas:

    Establish Security Loading Limits
    Reject documents that exceed structural thresholds before they are fully parsed.
    Remove Dangerous Content
    Strip active threats such as macros, embedded objects, and dangerous links during loading.
    Sanitize Private Information
    Remove metadata, revision history, and hidden content from already-loaded documents before you share or archive them.

    Establish Security Loading Limits

    Security loading limits protect your application against document-based denial-of-service attacks. A specially crafted file with deeply nested XML structures, excessive cell counts, or an inflated file size can cause memory exhaustion or application hangs during parsing. Security limits apply to all supported formats and take effect before the API fully loads the document.

    Use the RichEditDocumentServer.Options.SecurityLoadingLimits property to configure thresholds for the following document characteristics:

    • Maximum allowed file size in bytes
    • Maximum total number of XML elements across all document parts
    • Maximum nesting depth of any XML element
    • Maximum number of paragraphs in the document
    • Maximum number of tables
    • Maximum number of rows across all tables
    • Maximum number of document sections
    • Maximum number of sub-documents (headers, footers, text boxes, and similar embedded document parts)

    The following code snippet sets security loading limits:

    using DevExpress.XtraRichEdit;
    
    using (RichEditDocumentServer wordProcessor = new RichEditDocumentServer())
    {    
        WordProcessingSecurityLoadingLimits securityLimits = wordProcessor.Options.SecurityLoadingLimits;
    
        securityLimits.MaxFileSize = 50 * 1024 * 1024; // 50 MB
        securityLimits.MaxParagraphCount = 100_000;
        securityLimits.MaxTableCount = 1_000;
        securityLimits.MaxXmlElementDepth = 128;
    
        wordProcessor.LoadDocument("Documents\\Sample.docx");
    }
    

    When a document exceeds the set limit, the RichEditDocumentServer.SecurityLoadingLimitExceeded event fires. Set e.Handled = true in the event handler to let the document continue to load despite the violation (useful for log-only scenarios or staged rollouts).

    The following code snippet handles the SecurityLoadingLimitExceeded event:

    using DevExpress.XtraRichEdit;
    
    var wordProcessor = new RichEditDocumentServer();
    
    wordProcessor.SecurityLoadingLimitExceeded += (sender, e) => {
        Console.WriteLine($"Limit exceeded: {e.PropertyName}");
        e.Handled = false; // abort loading
    };
    

    Remove Dangerous Content

    The RichEditDocumentServerOptions.SecurityLoadingOptions property instructs the API to detect specific threats during a file loading operation. If matching content is located, the RichEditDocumentServer.SecurityLoadingOptionsViolation event fires. Set e.Handled = false in the event handler to remove the detected content, or leave it set to true to retain the content (useful for audit-only modes).

    The following code snippet sets SecurityLoadingOptions and handles the SecurityLoadingOptionsViolation event:

    using DevExpress.XtraRichEdit;
    
    var wordProcessor = new RichEditDocumentServer();
    WordProcessingSecurityLoadingOptions securityLoadingOptions = wordProcessor.Options.SecurityLoadingOptions;
    
    securityLoadingOptions.RestrictedHyperlinkRemovalMode = RestrictedHyperlinkRemovalMode.Full;
    securityLoadingOptions.RemoveRestrictedLinks = true;
    securityLoadingOptions.RemoveExternalImages = true;
    securityLoadingOptions.RemoveOleObjects = true;
    securityLoadingOptions.RemoveActiveXContent = true;
    securityLoadingOptions.RemoveMacros = true;
    securityLoadingOptions.RemoveDDEFields = true;
    securityLoadingOptions.RemoveIncludePictureFields = true;
    securityLoadingOptions.RemoveCustomXMLParts = true;
    
    wordProcessor.SecurityLoadingOptionsViolation += (sender, e) => {
        Console.WriteLine($"Dangerous content found: {e.PropertyName}");
        e.Handled = false; // false = remove the content
    };
    
    wordProcessor.LoadDocument("external_submission.docm")
    

    Remove Private Information (Sanitize Content)

    Call RichEditDocumentServer.Sanitize(WordProcessingSanitizeOptions) to remove personal data and internal organizational information from loaded documents before sharing, publishing, or archiving them. The method accepts a WordProcessingSanitizeOptions object that specifies which content categories to sanitize.

    The method returns a list of WordProcessingSanitizeResult objects. Each object records the detected content type and the action taken. Together, these objects provide a structured record of sanitization operations in the document. The following content types are included in the results list:

    Metadata

    Document metadata frequently contains personal data subject to GDPR Article 5 data minimization requirements: author full name and username, organization name, internal file paths, and edit timestamps.

    The Metadata property accepts a MetadataRemovalScope value. MetadataRemovalScope.All clears all built-in and extended document properties. Set the Metadata property to MetadataRemovalScope.None to skip metadata handling.

    Revision History and Comments

    Revision history can expose internal review workflows, contributor identities, and, in healthcare or legal contexts, information that may qualify as personal data under GDPR or protected health information under HIPAA.

    The TrackChanges property accepts a TrackedChangesSanitizeMode value. Use TrackChangesSanitizeMode.Accept to accept all pending revisions and remove the associated markup, or TrackChangesSanitizeMode.Reject to roll back all revisions. Both modes strip reviewer attribution from the document. Set the TrackChanges property to TrackChangesSanitizeMode.Ignore to leave revision history untouched.

    The SanitizeOptions.RemoveComments property removes all comments from Word documents, including threaded replies.

    Hidden Content

    Documents can contain content that is present in the file structure but not visible during editing. When the document is shared, this content remains in the file and can be retrieved by anyone who opens it with an appropriate tool or processes it in code.

    The HiddenText property accepts a InvisibleContentSanitizeMode value. Use InvisibleContentSanitizeMode.Remove to delete paragraphs and text runs marked with the hidden character property, or InvisibleContentSanitizeMode.MakeVisible to expose them in the document for review.

    The InvisibleText property detects and handles text made visually invisible through a foreground color that matches the page background. Set the InvisibleText property to InvisibleContentSanitizeMode.Remove to delete such content or InvisibleContentSanitizeMode.MakeVisible to restore its visibility.

    The following code snippet sanitizes metadata and revision history from a document:

    using DevExpress.XtraRichEdit;
    
    var wordProcessor = new RichEditDocumentServer();
    wordProcessor.LoadDocument("submission.docx");
    
    WordProcessingSanitizeOptions sanitizeOptions = new WordProcessingSanitizeOptions() {
        Metadata = MetadataRemovalScope.All,
        TrackedChanges = TrackedChangesSanitizeMode.Accept,
    };
    
    IList<WordProcessingSanitizeResult> findings = wordProcessor.Sanitize(sanitizeOptions);
    Console.WriteLine($"{findings.Count} finding(s) removed.");
    wordProcessor.SaveDocument("submission_clean.docx", DocumentFormat.OpenXml);
    

    Inspect Documents Before Sanitization

    Inspect a document to identify content types present in the file. This inspection helps when you need to report document contents, prompt the user before removing content, or tailor sanitization options.

    The RichEditDocumentServer.Inspect(WordProcessingInspectOptions) method scans a loaded Word document and returns a WordProcessingInspectResult that contains detected content types.

    Call WordProcessingInspectResult.CreateSanitizeOptions() to build a WordProcessingSanitizeOptions instance that targets only detected types. You can also call WordProcessingSanitizeOptions.FromInspectResult(WordProcessingInspectResult) for the same result. Pass the resulting options to the Sanitize method.

    The following code snippet inspects a document, creates sanitize options based on the inspection results, and then sanitizes the document:

    using DevExpress.XtraRichEdit;
    
    var wordProcessor = new RichEditDocumentServer();
    wordProcessor.LoadDocument("submission.docm");
    
    // Inspect first — discover what is present without modifying anything
    WordProcessingInspectResult inspectResult =
        wordProcessor.Inspect(WordProcessingInspectOptions.All);
    
    Console.WriteLine($"Detected: {string.Join(", ", inspectResult.ContentTypes)}");
    
    // Build sanitize options targeting only what was found
    WordProcessingSanitizeOptions sanitizeOptions = inspectResult.CreateSanitizeOptions();
    var findings = wordProcessor.Sanitize(sanitizeOptions);
    
    Console.WriteLine($"{findings.Count} finding(s) removed.");
    wordProcessor.SaveDocument("submission_clean.docx", DocumentFormat.OpenXml);