1+N avoidance: Fetch of relationship working, but ignored when accessed


Vaal
 

Hi,

I posted my question on StackOverflow, but realized it might be better to post here as on SO there's only 5-ish DN questions per year :-)

https://stackoverflow.com/questions/75422362/datanucleus-relationship-fetch-group-confusion

Models, queries and logs are all listed there.

I have:
- Model with relationships
- Added the relationships to the fetch group
- Retrieved a list of models via a query
- Observed in the logs that DN is "bulk fetching" the relationship for all models in the result set to avoid 1+N query problem.

So far so good. But when I then traverse the resultset of the query and access a relationship, DN seems to ignore the prefetched results and performs another query for each model.
So I end up with 1+1+N queries. I have looked at the fetch groups and those should be OK as the extra query happens even before accessing any fields.
The (debug) logs do not state why the prefetched results are not used.

I can repeat all the details here from SO, let me know if that's needed.

Valentijn


Andy
 
Edited

Hi,

perhaps your situation is as per https://github.com/datanucleus/datanucleus-rdbms/issues/170 ?

Stackoverflow is not used by this project, so anything you post on there won't get a reply from people involved in this project


Vaal
 

I don't think it is. Mine is a very basic case and not doing anything with FETCH JOIN.


Andy
 
Edited

A simple demonstration of BulkFetch of a Collection field is to be found at
https://github.com/andyjefferson/jdo-bulkfetch-collection

Does a query, where a Collection field is in the DFG for the query candidate, and as a result comes up with this SQL

SELECT 'mydomain.model.B' AS DN_TYPE,A1.ID,A1."NAME",A0.ID_OID,A0.IDX AS NUCORDER0
FROM A_BS A0 INNER JOIN B A1 ON A0.ID_EID = A1.ID
WHE
RE A0.IDX >= 0 AND EXISTS
    (SELECT 'mydomain.model.A' AS DN_TYPE,A0_SUB.ID AS DN_APPID
     FROM A A0_SUB
     WHERE A0.ID_OID = A0_SUB.ID) ORDER BY NUCORDER0

and that is all.

Suggest you debug what you're doing different.


Vaal
 

I'm not really doing much different and as the logs show that SQL is indeed generated and executed, as in the demonstration you posted.

2023-02-11 18:41:36,476 DEBUG [Native] SELECT 'org.dependencytrack.model.Vulnerability' AS "DN_TYPE","A1"."CREATED","A1"."CREDITS","A1"."CVSSV2BASESCORE","A1"."CVSSV2EXPLOITSCORE","A1"."CVSSV2IMPACTSCORE","A1"."CVSSV2VECTOR","A1"."CVSSV3BASESCORE","A1"."CVSSV3EXPLOITSCORE","A1"."CVSSV3IMPACTSCORE","A1"."CVSSV3VECTOR","A1"."CWES","A1"."DESCRIPTION","A1"."DETAIL","A1"."EPSSPERCENTILE","A1"."EPSSSCORE","A1"."FRIENDLYVULNID","A1"."ID" AS "NUCORDER0","A1"."OWASPRRBUSINESSIMPACTSCORE","A1"."OWASPRRLIKELIHOODSCORE","A1"."OWASPRRTECHNICALIMPACTSCORE","A1"."OWASPRRVECTOR","A1"."PATCHEDVERSIONS","A1"."PUBLISHED","A1"."RECOMMENDATION","A1"."REFERENCES","A1"."SEVERITY","A1"."SOURCE","A1"."SUBTITLE","A1"."TITLE","A1"."UPDATED","A1"."UUID","A1"."VULNID","A1"."VULNERABLEVERSIONS","A0"."COMPONENT_ID" FROM "COMPONENTS_VULNERABILITIES" "A0" INNER JOIN "VULNERABILITY" "A1" ON "A0"."VULNERABILITY_ID" = "A1"."ID" WHERE EXISTS (SELECT 'org.dependencytrack.model.Component' AS "DN_TYPE","A0_SUB"."ID" AS "DN_APPID" FROM "COMPONENT" "A0_SUB" WHERE "A0_SUB"."PROJECT_ID" = ? AND "A0"."COMPONENT_ID" = "A0_SUB"."ID") ORDER BY "NUCORDER0"
But for some reason when accessing the relation/prefetched results, DN reverts back to fetching the relationship again, but only for that specific model instance.
2023-02-11 18:41:36,515 DEBUG [Native] SELECT 'org.dependencytrack.model.Vulnerability' AS "DN_TYPE","A1"."CREATED","A1"."CREDITS","A1"."CVSSV2BASESCORE","A1"."CVSSV2EXPLOITSCORE","A1"."CVSSV2IMPACTSCORE","A1"."CVSSV2VECTOR","A1"."CVSSV3BASESCORE","A1"."CVSSV3EXPLOITSCORE","A1"."CVSSV3IMPACTSCORE","A1"."CVSSV3VECTOR","A1"."CWES","A1"."DESCRIPTION","A1"."DETAIL","A1"."EPSSPERCENTILE","A1"."EPSSSCORE","A1"."FRIENDLYVULNID","A1"."ID" AS "NUCORDER0","A1"."OWASPRRBUSINESSIMPACTSCORE","A1"."OWASPRRLIKELIHOODSCORE","A1"."OWASPRRTECHNICALIMPACTSCORE","A1"."OWASPRRVECTOR","A1"."PATCHEDVERSIONS","A1"."PUBLISHED","A1"."RECOMMENDATION","A1"."REFERENCES","A1"."SEVERITY","A1"."SOURCE","A1"."SUBTITLE","A1"."TITLE","A1"."UPDATED","A1"."UUID","A1"."VULNID","A1"."VULNERABLEVERSIONS" FROM "COMPONENTS_VULNERABILITIES" "A0" INNER JOIN "VULNERABILITY" "A1" ON "A0"."VULNERABILITY_ID" = "A1"."ID" WHERE "A0"."COMPONENT_ID" = ? ORDER BY "NUCORDER0"
It could still be something simple/obvious/stupid, but I don't see it nor do the people I have asked to look at it.


Andy
 

Focus on the log entries ... for the JDOQL ... and for whatever accessor methods you call after that ... by putting entries in the log to separate your calls. Maybe you have some field in the related object that is not in the fetch plan?


Vaal
 

I did all that and was hoping DN would log why it was retrieving the relationship again, but it's not clear. I'm not accessing any fields from models inside the relationship, just doing getVulnerabilities()

Copying everything from my linked SO post:
 

Question applies mostly to Component.java (this is a link to the sourcecode).

It has a many-to-many relationship (but I see the same problem with 1-N):

@Persistent(table = "COMPONENTS_VULNERABILITIES")
@Join(column = "COMPONENT_ID")
@Element(column = "VULNERABILITY_ID")
@Order(extensions = @Extension(vendorName = "datanucleus", key = "list-ordering", value = "id ASC"))
private List<Vulnerability> vulnerabilities;

@FetchGroup(name = "METRICS_UPDATE", members = {
        @Persistent(name = "id"),
        @Persistent(name = "lastInheritedRiskScore"),
        @Persistent(name = "uuid"),
        @Persistent(name = "vulnerabilities"),
        @Persistent(name = "analysises"),
        @Persistent(name = "policyViolations")                
})

Query<Component> query = pm.newQuery(Component.class);
query.setOrdering("id DESC");
query.setRange(0, 500);
query.getFetchPlan().setGroup(Component.FetchGroup.METRICS_UPDATE.name());
query.getFetchPlan().setFetchSize(FetchPlan.FETCH_SIZE_GREEDY);
query.getFetchPlan().setMaxFetchDepth(10);
components = query.executeList();

LOGGER.debug("Fetched " + components.size() + " components for project " + project.getUuid());
for (final Component component : components) {
    try {
        LOGGER.debug("Printing vulnerabilities count: " + component.getVulnerabilities().stream().count());

FetchSize and MaxDepth are added because I was trying stuff.

Logs

2023-02-11 18:41:36,458 DEBUG [Query] JDOQL Query : Executing "SELECT FROM org.dependencytrack.model.Component WHERE project == :project ORDER BY id DESC RANGE 0,500" ...
2023-02-11 18:41:36,460 DEBUG [Connection] ManagedConnection OPENED : "org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl@3a5359fd [conn=com.zaxxer.hikari.pool.HikariProxyConnection@78a8a548, commitOnRelease=true, closeOnRelease=true, closeOnTxnEnd=true]" on resource "nontx" with isolation level "read-committed" and auto-commit=false
2023-02-11 18:41:36,460 DEBUG [Datastore] Using PreparedStatement "HikariProxyPreparedStatement@289630101 wrapping SELECT 'org.dependencytrack.model.Component' AS "DN_TYPE","A0"."ID" AS "NUCORDER0","A0"."LAST_RISKSCORE","A0"."UUID" FROM "COMPONENT" "A0" WHERE "A0"."PROJECT_ID" = ? ORDER BY "NUCORDER0" DESC FETCH NEXT 500 ROWS ONLY " for connection "com.zaxxer.hikari.pool.HikariProxyConnection@78a8a548"
2023-02-11 18:41:36,463 DEBUG [Native] SELECT 'org.dependencytrack.model.Component' AS "DN_TYPE","A0"."ID" AS "NUCORDER0","A0"."LAST_RISKSCORE","A0"."UUID" FROM "COMPONENT" "A0" WHERE "A0"."PROJECT_ID" = ? ORDER BY "NUCORDER0" DESC FETCH NEXT 500 ROWS ONLY
2023-02-11 18:41:36,466 DEBUG [Retrieve] SQL Execution Time = 3 ms
2

... some other bulk prefetches snipped

2023-02-11 18:41:36,476 DEBUG [Retrieve] JDOQL Bulk-Fetch of org.dependencytrack.model.Component.vulnerabilities
2023-02-11 18:41:36,476 DEBUG [Datastore] Using PreparedStatement "HikariProxyPreparedStatement@555511245 wrapping SELECT 'org.dependencytrack.model.Vulnerability' AS "DN_TYPE","A1"."CREATED","A1"."CREDITS","A1"."CVSSV2BASESCORE","A1"."CVSSV2EXPLOITSCORE","A1"."CVSSV2IMPACTSCORE","A1"."CVSSV2VECTOR","A1"."CVSSV3BASESCORE","A1"."CVSSV3EXPLOITSCORE","A1"."CVSSV3IMPACTSCORE","A1"."CVSSV3VECTOR","A1"."CWES","A1"."DESCRIPTION","A1"."DETAIL","A1"."EPSSPERCENTILE","A1"."EPSSSCORE","A1"."FRIENDLYVULNID","A1"."ID" AS "NUCORDER0","A1"."OWASPRRBUSINESSIMPACTSCORE","A1"."OWASPRRLIKELIHOODSCORE","A1"."OWASPRRTECHNICALIMPACTSCORE","A1"."OWASPRRVECTOR","A1"."PATCHEDVERSIONS","A1"."PUBLISHED","A1"."RECOMMENDATION","A1"."REFERENCES","A1"."SEVERITY","A1"."SOURCE","A1"."SUBTITLE","A1"."TITLE","A1"."UPDATED","A1"."UUID","A1"."VULNID","A1"."VULNERABLEVERSIONS","A0"."COMPONENT_ID" FROM "COMPONENTS_VULNERABILITIES" "A0" INNER JOIN "VULNERABILITY" "A1" ON "A0"."VULNERABILITY_ID" = "A1"."ID" WHERE EXISTS (SELECT 'org.dependencytrack.model.Component' AS "DN_TYPE","A0_SUB"."ID" AS "DN_APPID" FROM "COMPONENT" "A0_SUB" WHERE "A0_SUB"."PROJECT_ID" = ? AND "A0"."COMPONENT_ID" = "A0_SUB"."ID") ORDER BY "NUCORDER0"" for connection "com.zaxxer.hikari.pool.HikariProxyConnection@78a8a548"
2023-02-11 18:41:36,476 DEBUG [Native] SELECT 'org.dependencytrack.model.Vulnerability' AS "DN_TYPE","A1"."CREATED","A1"."CREDITS","A1"."CVSSV2BASESCORE","A1"."CVSSV2EXPLOITSCORE","A1"."CVSSV2IMPACTSCORE","A1"."CVSSV2VECTOR","A1"."CVSSV3BASESCORE","A1"."CVSSV3EXPLOITSCORE","A1"."CVSSV3IMPACTSCORE","A1"."CVSSV3VECTOR","A1"."CWES","A1"."DESCRIPTION","A1"."DETAIL","A1"."EPSSPERCENTILE","A1"."EPSSSCORE","A1"."FRIENDLYVULNID","A1"."ID" AS "NUCORDER0","A1"."OWASPRRBUSINESSIMPACTSCORE","A1"."OWASPRRLIKELIHOODSCORE","A1"."OWASPRRTECHNICALIMPACTSCORE","A1"."OWASPRRVECTOR","A1"."PATCHEDVERSIONS","A1"."PUBLISHED","A1"."RECOMMENDATION","A1"."REFERENCES","A1"."SEVERITY","A1"."SOURCE","A1"."SUBTITLE","A1"."TITLE","A1"."UPDATED","A1"."UUID","A1"."VULNID","A1"."VULNERABLEVERSIONS","A0"."COMPONENT_ID" FROM "COMPONENTS_VULNERABILITIES" "A0" INNER JOIN "VULNERABILITY" "A1" ON "A0"."VULNERABILITY_ID" = "A1"."ID" WHERE EXISTS (SELECT 'org.dependencytrack.model.Component' AS "DN_TYPE","A0_SUB"."ID" AS "DN_APPID" FROM "COMPONENT" "A0_SUB" WHERE "A0_SUB"."PROJECT_ID" = ? AND "A0"."COMPONENT_ID" = "A0_SUB"."ID") ORDER BY "NUCORDER0"
2023-02-11 18:41:36,480 DEBUG [Retrieve] SQL Execution Time = 4 ms

As far as I can see this is using the filters from the original query to fetch the vulnerabilities relationship. So far so good.

The first statement after the query above is to access getVulnerabilities(), and this results in, more or less, the exact same query:

2023-02-11 18:41:36,494 DEBUG [ProjectMetricsUpdateTask] Fetched 5 components for project f04cfba0-7b94-4380-8cbd-aca492f97a7f
2023-02-11 18:41:36,494 DEBUG [Persistence] Object with id "org.dependencytrack.model.Component:45106" has a lifecycle change : "HOLLOW"->"P_NONTRANS"
2023-02-11 18:41:36,508 DEBUG [Persistence] Object "org.dependencytrack.model.Component@68f46ca4" field "vulnerabilities" is replaced by a SCO wrapper of type "org.datanucleus.store.types.wrappers.backed.List" [cache-values=true, lazy-loading=true, allow-nulls=true]
2023-02-11 18:41:36,508 DEBUG [Persistence] Object "org.dependencytrack.model.Component@68f46ca4" field "vulnerabilities" loading contents to SCO wrapper from the datastore
2023-02-11 18:41:36,512 DEBUG [Connection] ManagedConnection OPENED : "org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl@73aeb4aa [conn=com.zaxxer.hikari.pool.HikariProxyConnection@30f8bfe1, commitOnRelease=true, closeOnRelease=true, closeOnTxnEnd=true]" on resource "nontx" with isolation level "read-committed" and auto-commit=false
2023-02-11 18:41:36,512 DEBUG [Datastore] Using PreparedStatement "HikariProxyPreparedStatement@1310502991 wrapping SELECT 'org.dependencytrack.model.Vulnerability' AS "DN_TYPE","A1"."CREATED","A1"."CREDITS","A1"."CVSSV2BASESCORE","A1"."CVSSV2EXPLOITSCORE","A1"."CVSSV2IMPACTSCORE","A1"."CVSSV2VECTOR","A1"."CVSSV3BASESCORE","A1"."CVSSV3EXPLOITSCORE","A1"."CVSSV3IMPACTSCORE","A1"."CVSSV3VECTOR","A1"."CWES","A1"."DESCRIPTION","A1"."DETAIL","A1"."EPSSPERCENTILE","A1"."EPSSSCORE","A1"."FRIENDLYVULNID","A1"."ID" AS "NUCORDER0","A1"."OWASPRRBUSINESSIMPACTSCORE","A1"."OWASPRRLIKELIHOODSCORE","A1"."OWASPRRTECHNICALIMPACTSCORE","A1"."OWASPRRVECTOR","A1"."PATCHEDVERSIONS","A1"."PUBLISHED","A1"."RECOMMENDATION","A1"."REFERENCES","A1"."SEVERITY","A1"."SOURCE","A1"."SUBTITLE","A1"."TITLE","A1"."UPDATED","A1"."UUID","A1"."VULNID","A1"."VULNERABLEVERSIONS" FROM "COMPONENTS_VULNERABILITIES" "A0" INNER JOIN "VULNERABILITY" "A1" ON "A0"."VULNERABILITY_ID" = "A1"."ID" WHERE "A0"."COMPONENT_ID" = ? ORDER BY "NUCORDER0"" for connection "com.zaxxer.hikari.pool.HikariProxyConnection@30f8bfe1"
2023-02-11 18:41:36,515 DEBUG [Native] SELECT 'org.dependencytrack.model.Vulnerability' AS "DN_TYPE","A1"."CREATED","A1"."CREDITS","A1"."CVSSV2BASESCORE","A1"."CVSSV2EXPLOITSCORE","A1"."CVSSV2IMPACTSCORE","A1"."CVSSV2VECTOR","A1"."CVSSV3BASESCORE","A1"."CVSSV3EXPLOITSCORE","A1"."CVSSV3IMPACTSCORE","A1"."CVSSV3VECTOR","A1"."CWES","A1"."DESCRIPTION","A1"."DETAIL","A1"."EPSSPERCENTILE","A1"."EPSSSCORE","A1"."FRIENDLYVULNID","A1"."ID" AS "NUCORDER0","A1"."OWASPRRBUSINESSIMPACTSCORE","A1"."OWASPRRLIKELIHOODSCORE","A1"."OWASPRRTECHNICALIMPACTSCORE","A1"."OWASPRRVECTOR","A1"."PATCHEDVERSIONS","A1"."PUBLISHED","A1"."RECOMMENDATION","A1"."REFERENCES","A1"."SEVERITY","A1"."SOURCE","A1"."SUBTITLE","A1"."TITLE","A1"."UPDATED","A1"."UUID","A1"."VULNID","A1"."VULNERABLEVERSIONS" FROM "COMPONENTS_VULNERABILITIES" "A0" INNER JOIN "VULNERABILITY" "A1" ON "A0"."VULNERABILITY_ID" = "A1"."ID" WHERE "A0"."COMPONENT_ID" = ? ORDER BY "NUCORDER0"

Disabling L2 Cache doesn't help. There are a couple more relationships in the Component class. They all have the samen problem. I also notice that DN only fetches 1 level of relationships, despite the max depth being set to 10.

So two questions:

  • Why is the prefetched contents of the relationship not used?
  • Why is only 1 level of relationships prefetched?

I do notice that the vulnerabilities relation is empty in the database. Could that be triggering the repeat of queries?


Andy
 
Edited

If some field is loaded because it isn't currently loaded then there is a log message saying exactly that. Suggest you look at your log settings. And think of cutting it down into something simpler to debug it.

Having 0 elements could have been the problem since that was not supported (and just fell back to doing the load on access). See https://github.com/datanucleus/datanucleus-rdbms/issues/462

I already said why only 1 level of 1-N relations is loaded ... since only 1 level of 1-N is supported .... my first reply.


Vaal
 
Edited

https://github.com/datanucleus/datanucleus-rdbms/issues/462 could well be it then.

If only 1 level is supported, I'm gonna do my own queries anyway. But your first reply is about FETCH JOINs which I am not using (at least not explicitly and the query being generated is different).
I interpreted the maxFetchDepth value in the Fetch Plan docs as a way to "prefetch" multiple levels of fields/relationships in one go: https://www.datanucleus.org/products/accessplatform_4_1/jdo/fetchgroup.html

I'm not sure less logging will help as there's currently maybe only 10-ish lines of logging. I have enabled DEBUG logging for the full DataNucleus category. Using TRACE or ALL level didn't give me any extra messages.


Vaal
 

Just tried 6.0.4-SNAPSHOT:
- Queries are no longer repeated for empty bulk fetch results. So the fix/change seems to be working, thanks!
- I can see prefetching happening more than 1 level deep, which is also good and in line with my understandig of the maxFetchDepth:

2023-02-19 20:02:10,158 [] DEBUG [DataNucleus.Datastore.Native] SELECT 'org.dependencytrack.model.PolicyViolation' AS "DN_TYPE","B0"."ID","B0"."STATE","B0"."COMPONENT_ID","B0"."ID","B0"."POLICYVIOLATION_ID","B0"."PROJECT_ID","B0"."SUPPRESSED","C0"."AUTHOR","C0"."BLAKE2B_256","C0"."BLAKE2B_384","C0"."BLAKE2B_512","C0"."BLAKE3","C0"."CLASSIFIER","C0"."COPYRIGHT","C0"."CPE","C0"."DESCRIPTION","C0"."DIRECT_DEPENDENCIES","C0"."EXTENSION","C0"."EXTERNAL_REFERENCES","C0"."FILENAME","C0"."GROUP","C0"."ID","C0"."INTERNAL","C0"."LAST_RISKSCORE","C0"."LICENSE","C0"."LICENSE_URL","C0"."MD5","C0"."NAME","C0"."TEXT","C0"."PROJECT_ID","C0"."PUBLISHER","C0"."PURL","C0"."PURLCOORDINATES","C0"."LICENSE_ID","C0"."SHA1","C0"."SHA_256","C0"."SHA_384","C0"."SHA3_256","C0"."SHA3_384","C0"."SHA3_512","C0"."SHA_512","C0"."SWIDTAGID","C0"."UUID","C0"."VERSION","A0"."ID","D0"."ID","D0"."OPERATOR","D0"."SUBJECT","D0"."UUID","D0"."VALUE","E0"."ACTIVE","E0"."AUTHOR","E0"."CLASSIFIER","E0"."CPE","E0"."DESCRIPTION","E0"."DIRECT_DEPENDENCIES","E0"."EXTERNAL_REFERENCES","E0"."GROUP","E0"."ID","E0"."LAST_BOM_IMPORTED","E0"."LAST_BOM_IMPORTED_FORMAT","E0"."LAST_RISKSCORE","E0"."NAME","E0"."PUBLISHER","E0"."PURL","E0"."SWIDTAGID","E0"."UUID","E0"."VERSION","A0"."TEXT","A0"."TIMESTAMP","A0"."TYPE","A0"."UUID","A0"."COMPONENT_ID" FROM "POLICYVIOLATION" "A0" LEFT OUTER JOIN "VIOLATIONANALYSIS" "B0" ON "A0"."ID" = "B0"."POLICYVIOLATION_ID" INNER JOIN "COMPONENT" "C0" ON "A0"."COMPONENT_ID" = "C0"."ID" INNER JOIN "POLICYCONDITION" "D0" ON "A0"."POLICYCONDITION_ID" = "D0"."ID" INNER JOIN "PROJECT" "E0" ON "A0"."PROJECT_ID" = "E0"."ID" WHERE EXISTS (SELECT 'org.dependencytrack.model.Component' AS "DN_TYPE","A0_SUB"."ID" AS "DN_APPID" FROM "COMPONENT" "A0_SUB" WHERE "A0_SUB"."PROJECT_ID" = ? AND "A0_SUB"."ID" < ? AND "A0"."COMPONENT_ID" = "A0_SUB"."ID")