Data Director: Huge performance improvement in version 3.0

Stefano Viani

Management

19 May 2022 Digital Agency Pimcore

With version 3.0, Blackbit takes a big step and equips the Data Director with a more efficient storage method that makes imports significantly faster than with Pimcore's standard storage logic.

Blackbit veröffentlicht Version 3.0 des Data Directors für Pimcore

New, resource-saving storage method

Pimcore's default storage method is not optimal from a performance perspective, as it not only stores the changed data, but recalculates every aspect of all class fields: All fields are checked for validity, dependencies are recalculated every time, etc. So even if you only change a single input field, all these processing steps are performed.

That is why the Data Director 3.0 comes with its own storage mechanism that only saves the data that has really changed. This efficiency-optimised process of storage brings a performance increase of about 200 %.

As this is a major change, all existing dataports are set to a so-called "compatibility mode" after the upgrade to version 3.0. This means that they will initially continue to use the old storage mechanism to reduce the risk of many dataports not working. You can deactivate the "compatibility mode" in the advanced settings for dataports. For new dataports, "compatibility mode" is automatically turned off to ensure optimal performance.

More efficient loading of latest version dataports

Previously, to check whether an object had changed during import, the latest object version was loaded. However, since the versions are stored serialised in the file system, this process is quite time-consuming. In version 3, the old values of mapped import fields are now read before they are changed - thus making the resource-intensive loading of versions superfluous.

Parameterisation of dataport runs

You can now parameterise all elements of a dataport:

  • Access URL / CLI parameters in data query selectors of Pimcore-based dataports, e.g. image:thumbnail# returns the thumbnail path of the thumbnail definition "500px" when the dataport is accessed via the URL /api/export/dataport-name?format=500px
  • Access to URL/CLI parameters in callback functions via . This can be used, for example, to specify the output format of an export.
  • Access to URL / CLI parameters in import resource / SQL condition (exists since version 2.6)

It is now also possible to access fields of the source data class in data query selectors of Pimcore-based data ports. For example, if you have the data query selector myMethod#, the article number of the current source data class object is automatically used (as long as no URL parameter "articlenumber" overrides this). You can also call service classes with parameters in this way: @service_name::method#3108 calls the method "method" of a Symfony service "service_name" with the id of the current source class object.

Import into Calculated-Value fields

It is now possible to import data into Calculated-Value fields without having to create an extra PHP class for the calculation logic. This can be applied to all data that is only to be used for display but not for editing, such as for visualising data quality.

Display raw data in the Pimcore report

Version 3.0 provides a report adapter for Dataport raw data. This has two main use cases:

  1. Making raw data reusable between multiple dataports,
  2. Simplify report creation as no SQL/Pimcore database knowledge is required.

Prevent duplicate assets

With a single checkbox, it is now possible to create assets only if they do not already exist. This even works if the asset images have different file names or different image sizes.

UI changes

Dataport settings

  • Improved auto-completion for data query selectors.
  • Sorting of suggested Data Query Selectors by Levenshtein distance to the desired Data Query Selector -&gt_semicolon_ more relevant sorting.
  • when creating dataports, the dataport name is parsed to determine the dataport source type + source/target class

Attribute mapping

  • Language of localised fields is displayed as a flag to better identify the language of the localised field
  • Visualisation of dependencies when clicking on an attribute mapping field
  • Speed up generation of callback function templates
  • Updating the preview of dependent fields when updating the callback function
  • Maximise callback function window

History panel

  • Support for searching by dataport log file name to facilitate access to import archive file
  • Format start date according to user's current locale (derived from user's language setting)
  • No new window opens if the result callback function does not generate an output (e.g. for imports calling a dependent import)

Raw Data Extraction/Data Query Selectors

  • A warning is triggered if CSV/Excel file contains the same column header more than once.
  • Support ":url" data query selector for assets and image fields to get the absolute URL of the associated asset(s).
  • Support for searching for reverse relations via Category:products: when the Category class manages the relation to products and the source data class of the export is Product.
  • Support for "ancestors" / "descendants" in data query selectors to get all objects above / below the current object.
  • Support for filtering arrays in data query selectors, e.g. a many-to-many relationship by category can be filtered with categories:filter#published,true to get only the published related category objects.
    Another use case is if you have a field collection of prices and their validity dates, you can use prices:filter#validFrom,now,&gt_semicolon_=:filter#validTo,now,&lt_semicolon_= to get all field collection items that are valid today.
  • Support for "withInheritance" / "withoutInheritance" helper functions to enable / disable inheritance for individual data query selectors.
  • Support for suffix aliases in data query selectors, e.g. (scalar_semicolon_object:scalar) as group1

Processing of raw data

  • Provision of $params['transfer'] also for field callback functions
  • Support for searching relational objects via a unique index: No data query selector "Manufacturer:Name:".$params['value'] needs to be returned if the field "Name" in the class "Manufacturer" is unique. In this case, it is sufficient to assign the manufacturer name as a raw data field.
  • Support for searching multiple objects via data query selectors
  • Streaming of result documents keeps memory consumption low even when creating large export documents (currently only implemented for CSV).
  • Bugfix: Object key was not valid if the key was 255 characters long and an object with the same key already existed. The length of the suffix is now subtracted to get back to 255 characters.
  • Added new option to automatically create Classification Store fields.
  • Added support for automatic text creation via OpenAI API
  • Support for language mapping to translation provider, e.g. to use en-gb as target language for "en".
  • When restarting Dataport runs due to accidental termination, checks are made to see if Dataport can continue: Non-incremental exports cannot be continued and must be completely restarted. Imports and incremental exports can continue as before.
  • Support for assigning elements to asset metadata (previously only the type "input" was supported).
  • Bugfix: Processing of virtual fields used in key fields.

Other changes

  • Raw data is deleted in packages, as extensive deletion processes otherwise take too much time.
  • Restructuring of logging to use memory capacities more efficiently.
  • spatie/once is removed as this caused many unnecessary debug_backtrace() calls.
  • Logs in the application logger are grouped: Certain messages are listed only once and given a frequency index, e.g. "happened 3x".
  • The data query selector Product:articleNo:.:name#en for determining the current field value "articleNo" is no longer supported. This is because this data query selector is supposed to find products with articleNo=".". You can use $params['currentObjectData']['articleNo'] to get the current value of the field "articleNo".
  • Automatically correct misconfigured default timezones between web server PHP and CLI PHP by always storing data in UTC. Otherwise dataport runs could be aborted because they take too long, or negative run times are displayed in the history panel, etc.
  • Fixed: Notification mail about queue processor that could not be started was also sent if the queue processor was started but finished in less than 5 seconds.
  • Automatically reload elements if they were changed by automatic imports after saving.
  • Skip hash check for pimcore-based imports.
    Use case: Automatic import that sets published based on a specific logic of raw data fields.
    • 1st pass: object is published -&gt_semicolon_ import logic sets published to false -&gt_semicolon_ hash of raw data is saved -&gt_semicolon_ object is saved &amp_semicolon_ republished without changes.
    • 2nd pass: raw data is the same -&gt_semicolon_ but published has been changed -&gt_semicolon_ we need to run Dataport again, otherwise the object will be published even though the published logic would unpublish it.
  • Support different request contexts to be able to change the behaviour of overridden getter methods.
  • When renaming dataports, all redirects for old REST API endpoint URLs of that dataport point to new URL. This prevents redirect chains.
  • Bugfix: An edit-lock message is no longer triggered if the current user has just saved an object.
  • Result document action "send as mail" supports sending response documents as attachments.
  • Fixed: Automatic start did not work for Excel imports.
  • More efficient deletion of temporary files after each raw data block. Because by default they are only deleted when the whole process is finished, which wastes a lot of memory.
  • Bugfix: Cleanup of the application logger log files did not work correctly.
  • Logging of the user who started the Dataport run.
  • Remove automatic setting that manually uploaded files should use the "default" dataport resource. Instead, a separate resource is created for the uploaded file. Consequence: When a data port is run with the same file, the previous file is overwritten and the file name is displayed in the history panel instead of the generated uniqid().
  • Multiple raw data items are no longer processed in one database transaction, as a problem with one item would otherwise also prevent the import of all other items in the same transaction.

Blackbit Data Director on YouTube

Do you already know our video tutorials about the Data Director? For useful tips and detailed application, visit Blackbit on YouTube!

Still questions?

You have become curious and would like to get to know our Data Director better? Contact us now and we will show you in a free demo which possibilities the Data Director opens up for you.

About the Author

As Executive Director of Blackbit digital Commerce GmbH, Stefano Viani manages all areas of the agency in the offices in Göttingen, Hamburg, Berlin and Kiev. His passion is the development of marketing strategies and their implementation in concrete measures.

In his free time, Stefano is passionate about riding his motorbike or working out in the gym.