Skip to main content
All docs
V21.2

.NET Data-Processing API (MVP)

  • 4 minutes to read

Warning

DevExpress Data Processing API is a prototype (MVP) of a new product and is therefore not yet a part of the DevExpress product line.

DevExpress Data Processing API is a .NET library that allows you to convert your data (including ETL and data analysis) into usable and desired form.

Data Processing API

Typical Scenarios

  • Add ETL (Extract, Transform, Load) capabilities to .NET apps.
  • Shape data (group, sort, filter, apply analytics functions) before you display data in a UI application, regardless of platform.

Main Features of the Presented Library

You can do the following with our Data Processing API:

  • Connect to different data sources (relational databases, web services, Excel spreadsheets, JSON data, and so on) using a unified interface.
  • Process data at runtime in the application memory.
  • Embed business logic written in .NET at any point in your data processing.
  • Use functions to clean and structure data alongside with analytical functions.
  • Debug your app using a wide range of API.
  • Transform your data quickly from raw data to final output.

Common Concepts

This section shows how to work with DevExpress Data Processing API.

The common algorithm is the following:

  1. Create a new data flow (DataFlow) and use one of the functions to load data (for example, FromCsv or FromDatabase).
  2. Use functions to clean and structure your data and apply analytical functions (for example, ProcessColumn, AddColumn, Join, Aggregate, and so on).
  3. Define the output data format (for example, ToExcel, ToDataTable).
  4. Execute the previously defined data flow to generate resulting data (Execute).

You can use the API below to transform your data.

Load Data

  • From a CVS file: FromCsv
  • From a database: FromDatabase
  • From Excel spreadsheets (XLSX and XLS files): FromExcel
  • From a Web Service (JSON): FromJsonFile and FromJsonUrl
  • From .NET object: FromObject

Transform Data

  • Join data from different sources: Join
  • Unfold array values and display a new data row for every element in the array: Unfold
  • Add columns: AddColumn (using criteria operator or in code)
  • Modify column data: ProcessColumn
  • Filter data: Filter (using criteria operator or in code)
  • Sort data: Sort
  • Manage columns: SelectColumns, RenameColumns, RemoveColumn, RemoveColumns

Analyze Data

  • TopN: Top
  • Data aggregation: Aggregate

Upload Data

  • Upload data to a stream that contains an XLSX file: ToExcel
  • Upload data to a JSON string: ToJsonString
  • Upload data to a .NET object: ToDataTable, ToEnumerable

Debug

  • Get data for each step in the processed data flow: Debug

Performance

  • Data is stored by column (column-oriented DBMS). This approach allows us to optimize data analysis operations, such as data aggregation, join data from different sources, and so on.
  • Data engine supports multi-threaded data calculation to handle a large amount of data efficiently.
  • An optimized graph of data operations.

DevExpress Data Processing API can be faster or equal to Parallel Linq in aggregation calculation tasks (grouping and sums calculation) according to our experiments.

Note

A number of assumptions was made in the MVP implementation which do not fully reveal the performance. At the same time, performance can depend on many factors (for example, just-in-time (JIT) compilation). If you encounter performance issues, please feel free to describe your scenario in our Support Center.

Example

The example transforms the user survey data in the JSON format and data about users from an XLSX file in the following way:

  • Joins these two data flows to get a one data source. The “Feature list” column is an array of data. The Unfold operation creates a new row for each item in the array.
  • Aggregates data by “RegionCountryName” and “Feature list”.
  • Calculates the top 3 achievements for each country.
  • Sorts data.
  • Uploads data to an XLSX file.

View Example: DevExpress Data Processing API (MVP)

Note

This example does not contain a solution for Visual Basic. If you encounter any difficulties with the DevExpress Data Processing API prototype on VB.NET, please feel free to contact us in our Support Center.

var surveyFlow = DataFlow
    .FromJsonFile(Path.Combine("Data", "survey.json"))
    .Debug(e => {
        Console.WriteLine("JSON input columns:");
        e.PrintColumns();
        Console.WriteLine();
    })
    .SelectColumns("Submitted", "Customer ID", "Which of the following product features are important to you?")
    .RenameColumn("Which of the following product features are important to you?", "Feature list");

var customersFlow = DataFlow
    .FromExcel(Path.Combine("Data", "Customers.xlsx"), "Grid 1")
    .Debug(e => {
        Console.WriteLine("Excel input columns:");
        e.PrintColumns();
        Console.WriteLine();
    });

var resultFlow = surveyFlow.Join(customersFlow, "Customer ID", "CustomerKey")
    .Unfold("Feature list")
    .Aggregate(e => {
        e.GroupBy("RegionCountryName", "Feature list")
         .CountRows("Count");
    })
    .Top(3, "Count", new[] { "RegionCountryName" })
    .Sort(e => {
        e.SortColumns.Add("RegionCountryName", SortOrder.Ascending);
        e.SortColumns.Add("Count", SortOrder.Descending);
    })
    .Debug(e => {
        Console.WriteLine("Resulting columns:");
        e.PrintColumns();
        Console.WriteLine();
    })
    .ToExcelFile(Path.Combine(".", "survey_analysis.xlsx"));

var filePath = resultFlow.Execute();

The resulting XLSX file:

Data Processing API console app Excel file result

The repository also contains unit tests that cover different scenarios and shows the API usage, and the performance example that compares the following data processing technologies:

  • DevExpress Data-Processing API
  • Microsoft Linq
  • Microsoft Parallel Linq

Tip

Your opinion matters to us. You can use the feedback form on the page to share your thoughts or submit a Support Center ticket.