Spreadsheet Document API: Process Spreadsheet Content Safely
- 7 minutes to read
The Spreadsheet Document API includes configurable security layers for safe document processing. These layers address document security requirements in regulated industries:
- GDPR (General Data Protection Regulation) requires the removal of metadata such as author names, internal file paths, and edit history before documents are shared.
- HIPAA (Health Insurance Portability and Accountability Act) requires protection against malware in documents that contain patient health information and safeguards against PHI (Protected Health Information) leakage through document properties and tracked changes.
- SOX (Sarbanes-Oxley Act) requires documented controls over financial document integrity. The structured findings list returned by the Sanitize API supports Section 404 audit trail requirements.
The Safe Document Processing API covers three areas:
- Establish Security Loading Limits
- Reject documents that exceed structural thresholds before they are fully parsed.
- Remove Dangerous Content
- Strip active threats such as macros, embedded objects, and dangerous links during loading.
- Sanitize Private Information
- Remove metadata, revision history, and hidden content from already-loaded documents before you share or archive them.
Establish Security Loading Limits
Security loading limits protect your application against document-based denial-of-service attacks. A specially crafted file with deeply nested XML structures, excessive cell counts, or an inflated file size can cause memory exhaustion or application hangs during parsing. Security limits apply to all supported formats and take effect before the API fully loads the document.
Use the Workbook.Options.SecurityLoadingLimits property to configure thresholds for the following document characteristics:
- Maximum allowed file size in bytes
- Maximum total number of XML elements across all workbook parts
- Maximum nesting depth of any XML element
- Maximum number of worksheets in the workbook
- Maximum number of rows per worksheet
- Maximum number of columns per worksheet
- Maximum number of cells per worksheet
- Maximum number of charts in the workbook
The following code snippet sets security loading limits:
using DevExpress.Spreadsheet;
using (Workbook workbook = new Workbook())
{
WorkbookSecurityLoadingLimits securityLimits = workbook.Options.SecurityLoadingLimits;
securityLimits.MaxFileSize = 50 * 1024 * 1024; // 50 MB
securityLimits.MaxSheetColumnCount = 100;
securityLimits.MaxSheetRowCount = 50;
securityLimits.MaxWorksheetCount = 10;
workbook.LoadDocument("Documents\\Sample.xlsx");
}
When a document exceeds the set limit, the Workbook.SecurityLoadingLimitExceeded event fires. Set e.Handled = true in the event handler to let the spreadsheet continue to load despite the violation (useful for log-only scenarios or staged rollouts).
The following code snippet handles the SecurityLoadingLimitExceeded event:
using DevExpress.Spreadsheet;
Workbook workbook = new Workbook();
workbook.SecurityLoadingLimitExceeded += (sender, e) => {
Console.WriteLine($"Limit exceeded: {e.PropertyName}");
e.Handled = false; // abort loading
};
Remove Dangerous Content
The WorkbookOptions.SecurityLoadingOptions property instructs the API to detect specific threats during a file loading operation. If matching content is located, the Workbook.SecurityLoadingOptionsViolation event fires. Set e.Handled = false in the event handler to remove the detected content, or leave it set to true to retain the content (useful for audit-only modes).
The following code snippet sets SecurityLoadingOptions and handles the SecurityLoadingOptionsViolation event:
using DevExpress.Spreadsheet;
using (Workbook workbook = new Workbook())
{
WorkbookSecurityLoadingOptions securityLoadingOptions = workbook.Options.SecurityLoadingOptions;
securityLoadingOptions.RemoveMacros = true;
securityLoadingOptions.RemoveActiveXContent = true;
securityLoadingOptions.RemoveOleObjects = true;
securityLoadingOptions.RemoveRestrictedFormulas = true;
securityLoadingOptions.RemoveExternalWorkbooks = true;
securityLoadingOptions.RemoveExternalConnections = true;
securityLoadingOptions.RemovePivotCaches = true;
securityLoadingOptions.RemoveCustomXMLParts = true;
workbook.SecurityLoadingOptionsViolation += (_, e) => {
Console.WriteLine($"Dangerous content found: {e.PropertyName}");
e.Handled = false; // false = remove the content
workbook.LoadDocument("Documents\\Sample.xlsx");
}
}
Remove Private Information (Sanitize Content)
Call Workbook.Sanitize(WorkbookSanitizeOptions) to remove personal data and internal organizational information from loaded documents before sharing, publishing, or archiving them. The method accepts a WorkbookSanitizeOptions object that specifies which content categories to sanitize.
The method returns a list of WorkbookSanitizeResult objects. Each object records the detected content type and the action taken. Together, these objects provide a structured record of sanitization operations in the document. The following content types are included in the results list:
- Metadata
Document metadata frequently contains personal data subject to GDPR Article 5 data minimization requirements: author full name and username, organization name, internal file paths, and edit timestamps.
The Metadata property accepts a MetadataRemovalScope value.
MetadataRemovalScope.Allclears all built-in and extended document properties. Set theMetadataproperty toMetadataRemovalScope.Noneto skip metadata handling.- Threaded Comments
Threaded comments and legacy comments can expose internal review workflows, contributor identities, and, in healthcare or legal contexts, information that may qualify as personal data under GDPR or protected health information under HIPAA.
The SanitizeOptions.RemoveComments property removes all comments from a spreadsheet document. The RemoveThreadedComments property removes all threaded replies from spreadsheet documents.
- Hidden Content
Documents can contain content that is present in the file structure but not visible during editing. When the document is shared, this content remains in the file and can be retrieved by anyone who opens it with an appropriate tool or processes it in code.
The InvisibleCellText property detects and handles text made visually invisible through a foreground color that matches the page background. Set the
InvisibleCellTextproperty toInvisibleContentSanitizeMode.Removeto delete such content orInvisibleContentSanitizeMode.MakeVisibleto restore its visibility.The WorkbookSanitizeOptions.HiddenRows and WorkbookSanitizeOptions.HiddenColumns properties detect and handle content in hidden rows and columns. Set these properties to
HiddenContentSanitizeMode.Removeto delete content in hidden rows/columns orHiddenContentSanitizeMode.MakeVisibleto make them visible.The WorkbookSanitizeOptions.HiddenSheets property allows you to handle content in hidden worksheets.
The following code snippet sanitizes hidden sheets and comments from a spreadsheet:
using DevExpress.Office;
using DevExpress.Spreadsheet;
using (Workbook workbook = new Workbook())
{
workbook.LoadDocument("Documents\\Sample.xlsx");
WorkbookSanitizeOptions sanitizeOptions = new WorkbookSanitizeOptions
{
RemoveThreadedComments = true,
HiddenSheets = HiddenContentSanitizeMode.MakeVisible
};
var results = workbook.Sanitize(sanitizeOptions);
foreach (var result in results)
{
Console.WriteLine($"Content type: {result.Type}, Action taken: {result.Action}");
}
}
Inspect Documents Before Sanitization
Inspect a document to identify content types present in the file. This inspection helps when you need to report document contents, prompt the user before removing content, or tailor sanitization options.
The Workbook.Inspect(WorkbookInspectOptions) method scans a loaded spreadsheet document and returns a WorkbookInspectResult that contains detected content types.
Call WorkbookInspectResult.CreateSanitizeOptions() to build a WorkbookSanitizeOptions instance that targets only detected types. You can also call WorkbookSanitizeOptions.FromInspectResult(WorkbookInspectResult) for the same result. Pass the resulting options to the Sanitize method.
The following code snippet inspects a spreadsheet, creates sanitize options based on the inspection results, and then sanitizes the spreadsheet:
using DevExpress.Spreadsheet;
using (Workbook workbook = new Workbook())
{
workbook.LoadDocument("Documents\\Sample.xlsx");
WorkbookInspectResult inspectResult = workbook.Inspect(new WorkbookInspectOptions());
WorkbookSanitizeOptions sanitizeOptions = inspectResult.CreateSanitizeOptions();
var results = workbook.Sanitize(sanitizeOptions);
foreach (var result in results)
{
Console.WriteLine($"Content type: {result.Type}, Action taken: {result.Action}");
}
}