1+N avoidance: Fetch of relationship working, but ignored when accessed
Vaal
Hi,
I posted my question on StackOverflow, but realized it might be better to post here as on SO there's only 5-ish DN questions per year :-) https://stackoverflow.com/questions/75422362/datanucleus-relationship-fetch-group-confusion Models, queries and logs are all listed there. I have: - Model with relationships - Added the relationships to the fetch group - Retrieved a list of models via a query - Observed in the logs that DN is "bulk fetching" the relationship for all models in the result set to avoid 1+N query problem. So far so good. But when I then traverse the resultset of the query and access a relationship, DN seems to ignore the prefetched results and performs another query for each model. So I end up with 1+1+N queries. I have looked at the fetch groups and those should be OK as the extra query happens even before accessing any fields. The (debug) logs do not state why the prefetched results are not used. I can repeat all the details here from SO, let me know if that's needed. Valentijn |
|
Hi,
perhaps your situation is as per https://github.com/datanucleus/datanucleus-rdbms/issues/170 ? Stackoverflow is not used by this project, so anything you post on there won't get a reply from people involved in this project |
|
Vaal
I don't think it is. Mine is a very basic case and not doing anything with FETCH JOIN.
|
|
A simple demonstration of BulkFetch of a Collection field is to be found at
https://github.com/andyjefferson/jdo-bulkfetch-collection Does a query, where a Collection field is in the DFG for the query candidate, and as a result comes up with this SQL SELECT 'mydomain.model.B' AS DN_TYPE,A1.ID,A1."NAME",A0.ID_OID,A0.IDX AS NUCORDER0 FROM A_BS A0 INNER JOIN B A1 ON A0.ID_EID = A1.ID WHERE A0.IDX >= 0 AND EXISTS (SELECT 'mydomain.model.A' AS DN_TYPE,A0_SUB.ID AS DN_APPID FROM A A0_SUB WHERE A0.ID_OID = A0_SUB.ID) ORDER BY NUCORDER0 and that is all. Suggest you debug what you're doing different. |
|
Vaal
I'm not really doing much different and as the logs show that SQL is indeed generated and executed, as in the demonstration you posted.
But for some reason when accessing the relation/prefetched results, DN reverts back to fetching the relationship again, but only for that specific model instance.
It could still be something simple/obvious/stupid, but I don't see it nor do the people I have asked to look at it. |
|
Focus on the log entries ... for the JDOQL ... and for whatever accessor methods you call after that ... by putting entries in the log to separate your calls. Maybe you have some field in the related object that is not in the fetch plan?
|
|
Vaal
I did all that and was hoping DN would log why it was retrieving the relationship again, but it's not clear. I'm not accessing any fields from models inside the relationship, just doing getVulnerabilities()
Copying everything from my linked SO post:
Question applies mostly to Component.java (this is a link to the sourcecode).
It has a many-to-many relationship (but I see the same problem with 1-N):
FetchSize and MaxDepth are added because I was trying stuff.
Logs
... some other bulk prefetches snipped
As far as I can see this is using the filters from the original query to fetch the vulnerabilities relationship. So far so good.
The first statement after the query above is to access getVulnerabilities(), and this results in, more or less, the exact same query:
Disabling L2 Cache doesn't help. There are a couple more relationships in the Component class. They all have the samen problem. I also notice that DN only fetches 1 level of relationships, despite the max depth being set to 10.
So two questions:
I do notice that the vulnerabilities relation is empty in the database. Could that be triggering the repeat of queries?
|
|
If some field is loaded because it isn't currently loaded then there is a log message saying exactly that. Suggest you look at your log settings. And think of cutting it down into something simpler to debug it.
Having 0 elements could have been the problem since that was not supported (and just fell back to doing the load on access). See https://github.com/datanucleus/datanucleus-rdbms/issues/462 I already said why only 1 level of 1-N relations is loaded ... since only 1 level of 1-N is supported .... my first reply. |
|
https://github.com/datanucleus/datanucleus-rdbms/issues/462 could well be it then.
If only 1 level is supported, I'm gonna do my own queries anyway. But your first reply is about FETCH JOINs which I am not using (at least not explicitly and the query being generated is different). I interpreted the maxFetchDepth value in the Fetch Plan docs as a way to "prefetch" multiple levels of fields/relationships in one go: https://www.datanucleus.org/products/accessplatform_4_1/jdo/fetchgroup.html I'm not sure less logging will help as there's currently maybe only 10-ish lines of logging. I have enabled DEBUG logging for the full DataNucleus category. Using TRACE or ALL level didn't give me any extra messages. |
|
Vaal
Just tried 6.0.4-SNAPSHOT:
- Queries are no longer repeated for empty bulk fetch results. So the fix/change seems to be working, thanks! - I can see prefetching happening more than 1 level deep, which is also good and in line with my understandig of the maxFetchDepth: 2023-02-19 20:02:10,158 [] DEBUG [DataNucleus.Datastore.Native] SELECT 'org.dependencytrack.model.PolicyViolation' AS "DN_TYPE","B0"."ID","B0"."STATE","B0"."COMPONENT_ID","B0"."ID","B0"."POLICYVIOLATION_ID","B0"."PROJECT_ID","B0"."SUPPRESSED","C0"."AUTHOR","C0"."BLAKE2B_256","C0"."BLAKE2B_384","C0"."BLAKE2B_512","C0"."BLAKE3","C0"."CLASSIFIER","C0"."COPYRIGHT","C0"."CPE","C0"."DESCRIPTION","C0"."DIRECT_DEPENDENCIES","C0"."EXTENSION","C0"."EXTERNAL_REFERENCES","C0"."FILENAME","C0"."GROUP","C0"."ID","C0"."INTERNAL","C0"."LAST_RISKSCORE","C0"."LICENSE","C0"."LICENSE_URL","C0"."MD5","C0"."NAME","C0"."TEXT","C0"."PROJECT_ID","C0"."PUBLISHER","C0"."PURL","C0"."PURLCOORDINATES","C0"."LICENSE_ID","C0"."SHA1","C0"."SHA_256","C0"."SHA_384","C0"."SHA3_256","C0"."SHA3_384","C0"."SHA3_512","C0"."SHA_512","C0"."SWIDTAGID","C0"."UUID","C0"."VERSION","A0"."ID","D0"."ID","D0"."OPERATOR","D0"."SUBJECT","D0"."UUID","D0"."VALUE","E0"."ACTIVE","E0"."AUTHOR","E0"."CLASSIFIER","E0"."CPE","E0"."DESCRIPTION","E0"."DIRECT_DEPENDENCIES","E0"."EXTERNAL_REFERENCES","E0"."GROUP","E0"."ID","E0"."LAST_BOM_IMPORTED","E0"."LAST_BOM_IMPORTED_FORMAT","E0"."LAST_RISKSCORE","E0"."NAME","E0"."PUBLISHER","E0"."PURL","E0"."SWIDTAGID","E0"."UUID","E0"."VERSION","A0"."TEXT","A0"."TIMESTAMP","A0"."TYPE","A0"."UUID","A0"."COMPONENT_ID" FROM "POLICYVIOLATION" "A0" LEFT OUTER JOIN "VIOLATIONANALYSIS" "B0" ON "A0"."ID" = "B0"."POLICYVIOLATION_ID" INNER JOIN "COMPONENT" "C0" ON "A0"."COMPONENT_ID" = "C0"."ID" INNER JOIN "POLICYCONDITION" "D0" ON "A0"."POLICYCONDITION_ID" = "D0"."ID" INNER JOIN "PROJECT" "E0" ON "A0"."PROJECT_ID" = "E0"."ID" WHERE EXISTS (SELECT 'org.dependencytrack.model.Component' AS "DN_TYPE","A0_SUB"."ID" AS "DN_APPID" FROM "COMPONENT" "A0_SUB" WHERE "A0_SUB"."PROJECT_ID" = ? AND "A0_SUB"."ID" < ? AND "A0"."COMPONENT_ID" = "A0_SUB"."ID") |
|