Comparison
Information about comparing documents with the database
Documents can be used to compare a file to entities in the database. This is configured by document definitions.
Prerequisites
To fully understand this documentation, you need to have prior knowledge about definitions in general.
Introduction
To configure a comparison you need 2 document definitions:
- A definition which creates the comparison file as a source
- A definition which creates the comparison report as a target
Following steps are executed when comparing a input file to the database:
- The user creates a document with the definition for the comparison file
- In the background a comparison service is started and waits for the file to be uploaded
- The user uploads a file (blob) to the document
- In the background the comparison service is going to execute query requests and process the file according to the configuration stored on the document
- When all data is queried, it is compared to the comparison file and mismatches are determined
- The comparison service creates a document with the definition for the comparison report
- The comparison services uploads the file containing the mismatches
- The comparison report contains 3 types of mismatches:
- IN_FILE_NOT_DB
- IN_DB_NOT_FILE
- MISMATCH_ATTRIBUTES
Comparison File Definition
This document definition defines how the comparison file document is created.
The definition contains following properties.
Property | Description |
inputDefinitions | Contains the required inputs when creating the document from a definition |
document | Contains the template for creating the document, using the inputs as variables |
Until now it is the same as any other definition, but for comparison there is more configuration required on the document template.
Document configuration
The document template contains following additional properties in order to configure comparison.
Property | Description |
type | Must be COMPARISON_FILE |
extensions | If the encoding of the file is not utf-8, it need to be specified as a keyword
If the input file is a csv, some extensions need to be specified how the csv is structured (separator, quoted, headers). This is mainly for visualisation/preview purposes in the screen |
processActions | Contains an api request to the document comparison service with an identification of the document |
data | Firstly, it contains the query request for the database and mapping of database entries.
Secondly, it contains the mapping of the input file.
Thirdly, it contains configuration for the report to be created. |
Extensions
Example extensions
"extensions": {
"keywords": {
"csvSeparator": ",",
"encoding": "ISO-8859-1"
},
"booleans": {
"csvQuoted": true,
"csvHeaders": false
}
}Process Actions
Example process actions
"processActions": {
"apiRequest": {
"api": "documentcomparison",
"method": "POST",
"body": {
"identification": {
"id": "{{response.identification.id}}"
}
}
}
}Data
The data part contains the βlogicβ for the comparison. The
Property | Description |
api | required
Contains the api of the entity to be compared. |
database.request | required
Contains the request to be send. Pagination and sorting are important here, because the request will be send in iterations. To optimise queries, you can provide metadata with a custom nuql query to retrieve the data. But the regular request properties are also possible. |
database.mappingtype | The type of mapping to use. For now handlebars is supported. |
database.mapping | required
Contains the mapping of the request response (dbEntries) into a flat list. |
inputfile.skiprow | Determines which rows of the csv file to skip. |
inputfile.mappingtype | The type of mapping to use. For handlebars is supported. |
inputfile.mapping | required
Contains the mapping of the input file (rows) into a flat list. |
report.definition | required
Contains the definition to be used to create the comparison report. |
report.mimeType | required
Defines the mime type of the report file. Possible values: text/csv application/json |
report.text/csv | required
Contains additional properties to configure the csv (separator, quoted, headers). |
It is very important to keep the output mapping of the database and the input file synchronised. Because both need to produce comparable output.
Also the mapping requires a key property, this is used to do the actual comparison.
- If a
keyis present in the input file but not in the database β mismatch typeIN_FILE_NOT_DBis reported
- If a
keyis present in the database but not in the input file β mismatch typeIN_DB_NOT_IN_FILEis reported
- If a
keyis present in both the database and in the input file, all properties in the file are compared - If then the attributes are not matching β mismatch type
MISMATCH_ATTRIBUTESis reported - If all attributes are matching, no mismatch is reported.
Example data configuration
"data": {
"api": "accesspoints",
"database": {
"request": {
"metadata": {
"nuql": "AccessPoint:?accesspoint link:isDefinedIn #IndexSet:?indexSet. AccessPoint:?accesspoint attribute:accessPointRole ['accessPointRole#METERING_POINT']. AccessPoint:?accesspoint attribute:externalId value:any. AccessPoint:?accesspoint entity:type'AccessPoint'. %%sorting%% %%pagination%% Party:?filterPartyAccessPointRelation01Party attribute:externalId ['{{marketPartyExternalId}}']; link:isDefinedIn ?indexSet. PartyAccessPointRelation:?filterPartyAccessPointRelation01 link:hasTarget AccessPoint:?accesspoint. PartyAccessPointRelation:?filterPartyAccessPointRelation01 link:hasSource #Party:?filterPartyAccessPointRelation01Party. PartyAccessPointRelation:?filterPartyAccessPointRelation01 attribute:partyRole ['partyRole#{{marketPartyRole.id}}']. PartyAccessPointRelation:?filterPartyAccessPointRelation01 attribute:fromDate lte:'{{referenceDate}}'. PartyAccessPointRelation:?filterPartyAccessPointRelation01 attribute:toDate gt:'{{referenceDate}}'. PartyAccessPointRelation:?filterPartyAccessPointRelation01 attribute:active 1. PartyAccessPointRelation:?filterPartyAccessPointRelation01 if:exists reset. Party:?filterPartyAccessPointRelation01Party if:exists reset. AccessPointDetail:?accesspointDetail link:isAssignedTo #AccessPoint:?accesspoint. AccessPointDetail:?accesspointDetail attribute:fromDate lte:'{{referenceDate}}'. AccessPointDetail:?accesspointDetail attribute:toDate gt:'{{referenceDate}}'. AccessPointDetail:?accesspointDetail attribute:active 1. AccessPointDetail:?accesspointDetail limit 20000. AccessPointDetail:?accesspointDetail offset 0. PartyAccessPointRelation:?accesspointPartyAccessPointRelation link:hasTarget #AccessPoint:?accesspoint. PartyAccessPointRelation:?accesspointPartyAccessPointRelation limit 400. PartyAccessPointRelation:?accesspointPartyAccessPointRelation offset 0. PartyAccessPointRelation:?accesspointPartyAccessPointRelation link:hasSource Party:?accesspointPartyAccessPointRelationParty. PartyAccessPointRelation:?accesspointPartyAccessPointRelation attribute:toDate gt:'{{referenceDate}}'. PartyAccessPointRelation:?accesspointPartyAccessPointRelation attribute:active 1. AccessPointAccessPointRelation:?accesspointAccessPointAccessPointRelationTarget link:hasSource #AccessPoint:?accesspoint. AccessPointAccessPointRelation:?accesspointAccessPointAccessPointRelationTarget limit 400. AccessPointAccessPointRelation:?accesspointAccessPointAccessPointRelationTarget offset 0. AccessPointAccessPointRelation:?accesspointAccessPointAccessPointRelationTarget link:hasTarget AccessPoint:?accesspointAccessPointAccessPointRelationTargetAccessPoint. AccessPointAccessPointRelation:?accesspointAccessPointAccessPointRelationTarget attribute:toDate gt:'{{referenceDate}}'. AccessPointAccessPointRelation:?accesspointAccessPointAccessPointRelationTarget attribute:active 1. AccessPointAccessPointRelation:?accesspointAccessPointAccessPointRelationTarget attribute:accessPointRole 'accessPointRole#GRID_AREA'."
},
"pagination": {
"offset": 0,
"limit": 1000
},
"sorting": {
"order": [
"externalId+"
]
}
},
"mappingtype": "handlebars",
"mapping": "[<%#each dbEntries%> { \"key\": \"<%identification.externalId%>\", \"eanId\": \"<%identification.externalId%>\", \"productType\": \"<%extensions.tenantCodes.edsnProductType.0%>\", \"gridArea\": \"<%#each targets%><%#eq role 'GRID_AREA'%><%accessPoint.externalId%><%/eq%><%/each%>\", \"gridOperatorId\": \"<%#each relatedParties%><%#eq targetRole 'GRID_OPERATOR'%><%party.externalId%><%/eq%><%/each%>\", \"meteringResponsibleId\": \"<%#each relatedParties%><%#eq targetRole 'METERING_RESPONSIBLE'%><%party.externalId%><%/eq%><%/each%>\", \"balanceSupplierId\": \"<%#each relatedParties%><%#eq targetRole 'BALANCE_SUPPLIER'%><%party.externalId%><%/eq%><%/each%>\", \"balanceResponsibleId\": \"<%#each relatedParties%><%#eq targetRole 'BALANCE_RESPONSIBLE'%><%party.externalId%><%/eq%><%/each%>\", \"allocationMethod\": \"<%details.0.extensions.tenantCodes.edsnAllocationMethod.0%>\", \"energyDeliveryStatus\": \"<%details.0.extensions.tenantCodes.edsnEnergyDeliveryStatus.0%>\", \"energyFlowDirection\": \"<%details.0.extensions.tenantCodes.edsnEnergyFlowDirection.0%>\", \"marketSegment\": \"<%details.0.extensions.tenantCodes.edsnMarketSegment.0%>\", \"meteringMethod\": \"<%details.0.extensions.tenantCodes.edsnMeteringMethod.0%>\", \"physicalCapacity\": \"<%details.0.extensions.tenantCodes.edsnPhysicalCapacity.0%>\",\"physicalStatus\": \"<%details.0.extensions.tenantCodes.edsnPhysicalStatus.0%>\", \"profileCategory\": \"<%details.0.extensions.tenantCodes.edsnProfileCategory.0%>\", \"contractCapacity\": \"<%details.0.extensions.keywords.edsnContractCapacity.0%>\", \"eaEnergyConsumptionNettedPeak\": \"<%details.0.extensions.keywords.edsnEaEnergyConsumptionNettedPeak.0%>\", \"eaEnergyConsumptionNettedOffPeak\": \"<%details.0.extensions.keywords.edsnEaEnergyConsumptionNettedOffPeak.0%>\", \"eaEnergyProductionNettedPeak\": \"<%details.0.extensions.keywords.edsnEaEnergyProductionNettedPeak.0%>\", \"eaEnergyProductionNettedOffPeak\": \"<%details.0.extensions.keywords.edsnEaEnergyProductionNettedOffPeak.0%>\" }<%#unless @last%>,<%/unless%> <%/each%>]"
},
"inputfile": {
"skiprow": [
0
],
"mappingtype": "handlebars",
"mapping": "[<%#each rows as |row|%>{\"key\": \"<%row.0%>\",\"eanId\": \"<%row.0%>\", \"productType\": \"<%row.18%>\", \"gridArea\": \"<%row.5%>\", \"gridOperatorId\": \"<%row.1%>\", \"meteringResponsibleId\": \"<%row.3%>\", \"balanceSupplierId\": \"<%row.2%>\", \"balanceResponsibleId\": \"<%row.4%>\", \"allocationMethod\": \"<%row.39%>\", \"energyDeliveryStatus\": \"<%row.21%>\", \"energyFlowDirection\": \"<%row.20%>\", \"marketSegment\": \"<%row.19%>\", \"meteringMethod\": \"<%row.24%>\", \"physicalCapacity\": \"<%row.28%>\", \"physicalStatus\": \"<%row.22%>\", \"profileCategory\": \"<%row.31%>\", \"contractedCapacity\": \"<%row.27%>\", \"eaEnergyConsumptionNettedPeak\": \"<%row.33%>\", \"eaEnergyConsumptionNettedOffPeak\": \"<%row.32%>\", \"eaEnergyProductionNettedPeak\": \"<%row.35%>\", \"eaEnergyProductionNettedOffPeak\": \"<%row.34%>\" }<%#unless @last%>,<%/unless%><%/each%>]"
},
"report": {
"definition": {
"externalId": "NL_CAR_METERING_POINT_EXTRACT_REPORT_CSV"
},
"mimeType": "text/csv",
"text/csv": {
"separator": ";",
"quoted": true,
"headers": true
}
}
}Optimise queries with custom nuQL
In order to optimise queries custom nuQL queries can be written to get only the response required and also to order the nuQL statements in specific order to improve query performance for your use case.
A couple of things are important to keep in mind for query optimisation:
- Limited the nuQL query as much as possible, only query the data which is really necessary.
- For example: only retrieve entities where attribute externalId has a value by using this statement:
attribute:externalId value:any. - For example: limiting number of relations by filtering on fromDate, toDate and roles.
- βThrow awayβ temporary filter variables which are not necessary for the response, once the constraint is applied by using this statement:
if:exists reset. This prevents to keep unnecessary data all the time in memory.
- The variables used in the nuQL query need to comply with a convention to be able to translate the result in an API response.
- You can execute a βregularβ query request on the API with the postfix
/nuqlin the URL to see the generated nuQL with variables: - For example:
/accesspoints/_query/nuql
- The sorting and pagination are included as dynamic parameters such that requests can be iterated per page.
- Using this syntax:
%%sorting%%%%pagination%%
Comparison Report Definition
This document definition defines how the comparison report document is created. It does not contain any advanced logic, like the comparison file.
The definition contains following properties.
Property | Description |
inputDefinitions | Contains the required inputs when creating the document from a definition |
document | Contains the template for creating the document, using the inputs as variables |
Input Definitions
For the report document there is a required set of input definitions which need to be available to ensure the comparison service can create the report document with the necessary data.
Input | Type | Description |
comparisonFileDocument | entity
(entityType Document) | This input provides a reference to the source document which produced this report |
countInFileNotInDb | number | This input represents the number of mismatches with type IN_FILE_NOT_IN_DB |
countInDbNotInFile | number | This input represents the number of mismatches with type IN_DB_NOT_IN_FILE |
countMismatchAttributes | number | This input represents the number of mismatches with type MISMATCH_ATTRIBUTEES |
Required input definitions for comparison report
"inputDefinitions": [
{
"id": "comparisonFileDocument",
"label": "Source for comparison report",
"type": "entity",
"entityType": "Document",
"validations": [
{
"name": "required",
"config": "true"
},
{
"name": "min",
"config": "1"
},
{
"name": "max",
"config": "1"
},
{
"name": "filter",
"config": {
"filter": {
"attributes": [
{
"attribute": "type",
"operator": "equals",
"values": [
"COMPARISON_FILE"
]
}
]
}
}
}
]
},
{
"id": "countInFileNotInDb",
"label": "Count of AP's in file but not in DB",
"type": "number",
"validations": [
{
"name": "required",
"config": "true"
}
]
},
{
"id": "countInDbNotInFile",
"label": "Count of AP's in DB but not in file",
"type": "number",
"validations": [
{
"name": "required",
"config": "true"
}
]
},
{
"id": "countMismatchAttributes",
"label": "Count of AP's with attribute mismatches",
"type": "number",
"validations": [
{
"name": "required",
"config": "true"
}
]
}
]Document
The document part contains the template for creating the report document. This is freely configurable. Below an example which uses the input definitions.
- The inputs with number of mismatches per type are stored as extensions
- Some CSV extensions are stored, such that the attached file can be previewed
- The comparison file document input is related as a source to this report document
- And when the report document is successfully created, a business event will be sent containing the counts and a reference to the report document
Example document template
"document": {
"idGeneration": [
"publicId"
],
"identification": {
"externalId": "{{comparisonFileDocument.0.identification.externalId}}_REPORT"
},
"name": "NL C-AR Metering Point Extract Report (csv)",
"type": "COMPARISON_REPORT",
"roles": [
"TARGET"
],
"fileTimestamp": "{{now}}",
"extensions": {
"values": {
"countInFileNotInDb": "{{countInFileNotInDb}}",
"countInDbNotInFile": "{{countInDbNotInFile}}",
"countMismatchAttributes": "{{countMismatchAttributes}}"
},
"keywords": {
"csvSeparator": ";"
},
"booleans": {
"csvQuoted": true,
"csvHeaders": true
}
},
"sources": [
{
"document": {
"id": "{{comparisonFileDocument.0.identification.id}}"
},
"role": "SOURCE",
"fromDate": "{{today}}"
}
],
"processActions": {
"businessEvent": [
{
"order": {
"businessKey": "{{comparisonFileDocument.0.identification.publicId}}"
},
"name": "NL_CAR_METERING_POINT_EXTRACT_REPORT_MESSAGE",
"type": "message",
"correlationRequired": false,
"variables": {
"countInFileNotInDb": "{{countInFileNotInDb}}",
"countInDbNotInFile": "{{countInDbNotInFile}}",
"countMismatchAttributes": "{{countMismatchAttributes}}",
"reportDocument": "{{response.identification.id}}"
}
}
]
}
}Last updated on January 5, 2023