Topics

Migration from Kodo: Any way to make it store foreign keys to later get already loaded objects from cache (for performance)?


dosiwelldi
 

Hi

I try to migrate a legacy application (on Kodo 3.4, develped over 10+ years).
I now created all needed plugins so our very much not JDO compliant code runs.
However, now that it works I have a massive performance problem. The problem is that datanucleus does not seem to use the foreign keys to check if the related object is already in the cache. That means if using datanucleus an order of magnitude more queries to the DB happen.

Example:
We have Enterprise and Person. Person has a foreign key to Enterprise.If I load 2 Person from the same enterprise, p1 and p2 and then call p1.getEnterprise() it will load the enterprise from the DB.
If I then call p2.getEnterprise() it will again load the same Enterprise from the DB.
Kodo 3.4 will store the foreign key for related objects and then use that to lookup the object in the cache.

  • Is there a way to make datanucleus store the foreign key and later use that to get the related object from cache?
  • Can anybody say how much work it would be to make datanucleus do that?

Because with this problem I probably have to cancel the migration :(

Thanks and kind regards
Dirk


Andy
 

Only you know what is in the fetch plan for a retrieve, and what sql is invoked to retrieve what it retrieves. To check a cache it needs the id of a related object, and ut has to get that from somewhere.

Impossible to say more without basic info


dosiwelldi
 

Hi Andy

Thanks.
There is no fetch plan involved. On loading the persons the sql gets the foreign key to the enterprise according to the logs. So it will not load the enterprise (and I don't want it to).

Just to be clear: this is not about loading the enterprise on loading the person.

This is about using the the foreign key (PerEntID) that was retrieved with the person on later call to getEnterprise() to see if the enterprise is now in the cache. Instead of always going to the DB.

Regards


Andy
 
Edited

Why not present facts, starting with the definition of the relation in the classes, the jdo api call(s) in question, and the associated log entries for those calls? Any retrieve will have a FetchPlan, whether it be "default" or user defined. Maybe it didnt in JDO 1, but this isn't JDO 1.
That way it goes away from wordy descriptions that are, by nature, imprecise.


dosiwelldi
 

Thanks a lot for your effort.

I just had a quick look with the debugger and I did not see where it would store the foreign key (to later be able to use that to do the lookup in the cache).

Mapping:
<class name="Enterprise" identity-type="application" table="Enterprise">
    <field name="enterpriseIDValue" column="EntEnterpriseID" primary-key="true"/>
    <field name="name">
        <column name="EntName"/>
    </field>
    ...
</class>
<class name="Person" table="Person">
    <datastore-identity strategy="identity" column="PerPersonID"/>
    <field name="name">
        <column name="PerName"/>
    </field>
    <field name="enterprise">
        <column name="PerEnterpriseID" target="EntEnterpriseID"/>
    </field>
    ...
</class>

Java:
//Both persons have the same enterprise, we assume it is not in cache currently
Person p1 = IdHelper.getObjByStrId(pm, "270799", Person.class);
Person p2 = IdHelper.getObjByStrId(pm, "325506", Person.class);

//we go to the DB for the next access, to get the enterprise, that is to be expected
System.out.println("Access p1 ent: " + p1.getEnterprise());

//here is my problem: it goes to the DB again even that enterprise is now in cache:
System.out.println("Access p2 ent: " + p2.getEnterprise());

Log:
SELECT A0.PerName,A0.PerEnterpriseID,... FROM Person A0 WHERE A0.PerPersonID = <270799>
SELECT A0.PerName,A0.PerEnterpriseID,... FROM Person A0 WHERE A0.PerPersonID = <325506>
SELECT B0.EntEnterpriseID,B0.EntName,.. FROM Person A0 LEFT OUTER JOIN Enterprise B0 ON A0.PerEnterpriseID = B0.EntEnterpriseID WHERE A0.PerPersonID = <270799>
SELECT B0.EntEnterpriseID,B0.EntName,.. FROM Person A0 LEFT OUTER JOIN Enterprise B0 ON A0.PerEnterpriseID = B0.EntEnterpriseID WHERE A0.PerPersonID = <325506>

Is this the expected behavior or are my mappings or plugins wrong?
If it is expected, how much work would it be to implement caching of the foreign keys to use them for cache lookups?


Andy
 

And the classes, for example the relation field, assuming it is an actual relation field and not an integer.

Since the relation is NOT in the default fetch group then it (and the FK) are not fetched in the Person load, as per the JDO spec.
When it comes to the Enterprise field it has no "foreign key" (well, the "id" of the related object, so it calls the DB, as per the JDO spec.

So what is the problem? You don't want the extra DB call to get the Enterprise? Put the Enterprise field in the DFG, or at least in the fetch plan for that retrieve. Or if you don't want the complete Enterprise object loading when it is found, then specify the metadata extension "fetch-fk-only" on the relation field, as per the docs https://www.datanucleus.org/products/accessplatform_5_2/jdo/mapping.html#one_one_uni

Log entries tell you whether the cache is involved or not, but you haven't posted such entries.


dosiwelldi
 

On Fri, Jan 15, 2021 at 02:43 PM, Andy wrote:
Log entries tell you whether the cache is involved or not, but you haven't posted such entries.
There are none because it did not look at the cache on call to person.getEnterprise().

Thanks a thousand times!

With this:
<field name="enterprise" default-fetch-group="true">
    <column name="PerEnterpriseID"/>
    <extension vendor-name="datanucleus" key="fetch-fk-only" value="true"/>
</field>

It does what I want.
I think this is even better than what Kodo does. With Kodo any call to person.getEnterprise() would trigger a query if the Enterprise is not in cache. E.g. for `p1.getEnterprise() != p2.getEnterprise()`, even if the actual Enterprise is not needed. With datanucleus this comparison would not tigger a query.

Now I probably have problems with code that puts person.enterprise in a fetchGroup/Plan and expecting it to be loaded in one query... But we have few of those compared to the thousands of normal queries, so I can probably adapt that code if I have to.

2 other thing if you still have patience for me:
  • We have much code that uses == and != on the instances. With datanucleus pm.evictAll() leads to later queries returning different instances than before the evictAll(), while Kodo returns the same ones. I assume the JDO spec allows that?
  • I solved the previous problem by extending `SoftRefCache` and have `clear()` do nothing. That seems to work. However, if I do the same with `WeakRefCache` I get many strange messages in the log. As far as you know, is `WeakRefCache` fully functional at the moment or are there known problems with it (we have PM multithreaded if that matters)?


Andy
 

All code is "fully functional", and JDO compliant (DN implements pm.evictAll correctly unless proven otherwise, and the JDO TCK underlines that), and the L1 caches provided all use standard Java classes visible in the code, though you can easily plug in your own L1 cache should you wish.
Specifying a PM as "multithreaded" is prone to issues since that persistence flag only caters for some specific locking conditions, and many others could cause it to have problems if really using multiple threads on a single PM; would be far better advised to keep a PM per thread, and then less chance of your head exploding due to timing issues when you have problems.

If anyone has a problem they can easily demonstrate it with a standard testcase and optionally provide a GitHub pull request to fix it, but then that is a significant benefit of using open source instead of the likes of Kodo ...


dosiwelldi
 

Hi again

There is a behavior change if we use 'fetch-fk-only' that kind of hurts us.

We have foreign keys that point to not existing primary keys. With DN and Kodo this will work and just return null for that field. But when I set 'fetch-fk-only' it will instead throw an exception.

Minor things with 'fetch-fk-only':
  • pm.refreshAll() can now throw exceptions where it did not before (again FK to not existing PK: it will create a hollow object in the cache for the not existing object and on trying to refresh that it will fail). This is no problem for us, I will just put 'try/catch' around pm.refreshAll().
  • if 'fetch-fk-only' is set on a field that does not point to a PK but to a field in an other class (using 'mapped-by') it will still create a hollow object in the cache for the not existing object setting the key to the value of the 'mapped-by' field, but that is not the value of the PK. This is no problem for us, I will just not set 'fetch-fk-only' in these cases.

Thanks and kind regards
Dirk


Andy
 

"foreign keys to not existing primary keys"
You mean you have zero data integrity? Down to you persisting some data without using JDO and it was rubbish? Nice; good luck with that one.
If you have actual foreign keys (in the DB) then you can't have that. Why not give an example?


If you get an exception, what is the exception?


dosiwelldi
 

I just wrote you so you know using 'fetch-fk-only' can lead to behavior changes that may not be obvious to the user. But I assume if we use 'extensions' we have to expect changes in behavior. If you say that is to be expected that is OK, I will have to try to fix the integrity of our data anyway.

>> You mean you have zero data integrity?
Yes, I try to change that.

>> Down to you persisting some data without using JDO and it was rubbish? Nice; good luck with that one.
Yes. Coders that did not care about clean code, standards or cleaning up old stuff. Unrelated people that run their own SQL directly on DB, everything ugly you can imagine. A over 12+ years growing pile of spaghetti...

>> If you have actual foreign keys (in the DB) then you can't have that.
We don't have that currently. But I am working on it...

>> Why not give an example?

For the exception below:
<class name="DoseCalculationParameter" table="DoseCalculationParameter">
    <datastore-identity strategy="identity" column="DcpID"/>
    <field name="comment">
        <column name="DcpComment" allows-null="true"/>
    </field>
    ...
</class>
<class name="DoseCalculationParameterList" table="DoseCalculationParameterList">
    <datastore-identity strategy="identity" column="DcplID"/>
    <field name="parameter" default-fetch-group="true"> <!-- FIELD OF TYPE DoseCalculationParameter -->
        <column name="DcplParameter" target="DcpID" allows-null="true"/>
        <extension vendor-name="datanucleus" key="fetch-fk-only" value="true"/>
    </field>
    ...
</class>

We have DoseCalculationParameterList.DcplParameter that point to not existing DoseCalculationParameter. Without "fetch-fk-only" this is fine. But with that it will not work because it creates a hollow object in cache and on trying to load that it will fail.

>> If you get an exception, what is the exception?

Below I loaded a DoseCalculationParameterList with such a not existing 'parameter'. ToString() is called and wants to access member 'comment' on the hollow instance (of the not existing object) in cache. This triggers a load that fails.
Error is 'No such database row' (same in refreshAll() case).


    [junit] javax.jdo.JDOObjectNotFoundException: No such database row
    [junit] FailedObject:xxx.yyy.DoseCalculationParameter:999123999
    [junit]     at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:563)
    [junit]     at org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:543)
    [junit]     at org.datanucleus.state.StateManagerImpl.isLoaded(StateManagerImpl.java:4128)
    [junit]     at xxx.yyy.DoseCalculationParameter.dnGetcomment(DoseCalculationParameter.java)
    [junit]     at xxx.yyy.DoseCalculationParameter.toString(DoseCalculationParameter.java:130)
    [junit]     at java.base/java.lang.String.valueOf(String.java:2951)
    [junit]     at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
    [junit]     at xxx.yyy.test.datanucleus.DatanucleusFkObjectNotExistingTest.testFkNotExisting(DatanucleusFkObjectNotExistingTest.java:73)
    [junit]     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [junit]     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    [junit]     at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    [junit] NestedThrowablesStackTrace:
    [junit] No such database row
    [junit] org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row
    [junit]     at org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:440)
    [junit]     at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:316)
    [junit]     at org.datanucleus.state.StateManagerImpl.loadFieldsFromDatastore(StateManagerImpl.java:1542)
    [junit]     at org.datanucleus.state.StateManagerImpl.loadUnloadedFieldsInFetchPlan(StateManagerImpl.java:3897)
    [junit]     at org.datanucleus.state.StateManagerImpl.isLoaded(StateManagerImpl.java:4112)
    [junit]     at xxx.yyy.DoseCalculationParameter.dnGetcomment(DoseCalculationParameter.java)
    [junit]     at xxx.yyy.DoseCalculationParameter.toString(DoseCalculationParameter.java:130)
    [junit]     at java.base/java.lang.String.valueOf(String.java:2951)
    [junit]     at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
    [junit]     at xxx.yyy.test.datanucleus.DatanucleusFkObjectNotExistingTest.testFkNotExisting(DatanucleusFkObjectNotExistingTest.java:73)
    [junit]     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [junit]     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    [junit]     at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)


dosiwelldi
 

Now that I think about it there is probably no easy way to change that in DN.
From what I saw until now I assume DN has no cache for the FKs. What 'fetch-fk-only' does is to assign a hollow object to the real field on the object. So it "caches the FK in the hollow". This has more implications like behavior of ==/!= (for null) can be different...

Kodo has a real cache for the FKs, so the behavior does not change if doing "fetch-fk-only" (because there the field stays null and nothing is in L1 cache until a real access to such a field happens, and at that point it sees that this FK is invalid and it will keep the field as null).


Andy
 
Edited

Use of "fetch-fk-only" simply omits the join to the related table (for performance, as the log shows). When processing the result of the query it will populate the (relation) field with an object of the required type with just the "identity" field(s) populated. This is just like you could get if you called the JDO standard "pm.getObjectById(id, false)". In this specific case here (dodgy data in the DB) such a call to pm.getObjectById(id, false) for the related object would also result in exceptions downstream, since the object is later "validated" (checked for existence) and that would cause it to be invalidated.

There is no "cache" of "FKs" (not that you can call these Foreign Keys, when the key sometimes doesnt exist). An object has a StateManager and that StateManager sets the value of the fields in the object. In this case it stores an instance of the related object in that field (with just id field(s) set). Caching is based around persistable objects.

[EDIT]
The only possible extension to what DN does currently that arguably could have (very minor) advantage is have a different setting that pulls in the "id" of the related object (and stores it internally in the StateManager) but DOESN'T instantiate it (unlike "fetch-fk-only" which does instantiate). When the field is referenced the StateManager could then check for whether the field is loaded, and if not then check whether it has the related object "id" and if so then trigger a load of that object (and if it doesn't have the "id" then load with a join like current codebase). It doesn't get away from the issue of the user having a related object with an "id" that doesn't exist ... I'd expect the user to be "informed" about that one way or another (e.g exception). This extension would be left for people to contribute since its (IMHO) very minor advantage.


dosiwelldi
 

Hi again, 2 more things:

1) Parser Bug

I think I found a bug in the JDOQL query parser: booleans fields in parentheses lead to javax.jdo.JDOUserException: Method/Identifier expected

While these are ok (`active` is a boolean field on Class enterprise):
pm.newQuery(Enterprise.class, "active");
pm.newQuery(Enterprise.class, "!active");
pm.newQuery(Enterprise.class, "(!active)");

This one will throw an exception:
pm.newQuery(Enterprise.class, "(active)");     //Method/Identifier expected

Stack trace:
    [junit] Method/Identifier expected at character 9 in "(active)"
    [junit] org.datanucleus.store.query.QueryCompilerSyntaxException: Method/Identifier expected at character 9 in "(active)"
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:802)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:643)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:569)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:540)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:454)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:437)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:423)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:409)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:395)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:376)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:365)
    [junit]     at org.datanucleus.query.compiler.JDOQLParser.parse(JDOQLParser.java:88)
    [junit]     at org.datanucleus.query.compiler.JavaQueryCompiler.compileFilter(JavaQueryCompiler.java:600)
    [junit]     at org.datanucleus.query.compiler.JDOQLCompiler.compile(JDOQLCompiler.java:103)
    [junit]     at org.datanucleus.store.query.AbstractJDOQLQuery.compileGeneric(AbstractJDOQLQuery.java:392)
    [junit]     at org.datanucleus.store.query.AbstractJDOQLQuery.compileInternal(AbstractJDOQLQuery.java:450)
    [junit]     at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:263)
    [junit]     at org.datanucleus.store.query.Query.executeQuery(Query.java:1936)
    [junit]     at org.datanucleus.store.query.Query.executeWithArray(Query.java:1864)
    [junit]     at org.datanucleus.store.query.Query.execute(Query.java:1846)
    [junit]     at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:439)
    [junit]     at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:263)


2) query.getFetchPlan().setFetchSize(1) will still retrieve the whole table

In short:
I probably understood wrongly what query.getFetchPlan().setFetchSize(1) does. I wanted to prevent queries from loading whole tables initially, to just get rows as needed. But I see in the debugger that JDOFetchPlan.getFetchSize() is never called for my code so it is probably not meant for that.

In long:
I wrote a script that tries to extract most of our queries into a unit test to see if they still work. I only care if they compile so I do not iterate over the result collection. Kodo takes about 1 minute to execute all of them. If I run the same test with DN I get an out of memory exception (I think from junit that tries to cache the log file). Without logging it runs but will never finish.

I then added query.getFetchPlan().setFetchSize(1) to each query so it should only get one row. But it still gets the whole table.

//Example: Takes 68s with DN and reads 710k rows:
Query query = pm.newQuery(OperationalEvents.class);
query.getFetchPlan().setFetchSize(1);
query.execute();

I now added query.setRange(0, 1) to all tests as a workaround. This works fine for my test. But we have some queries in the real code that ask for a lot of data but then only access the first row... Will be fun to try to find them all.

Log after many seconds: now getting rows 101542 and 101543:
    [junit] 2021-01-28 19:36:20,481 DEBUG [main] DataNucleus.Persistence - Retrieved object with OID "xxx.yyy.OperationalEvents:101542"
    [junit] 2021-01-28 19:36:20,481 DEBUG [main] DataNucleus.Cache - Object with id "xxx.yyy.OperationalEvents:101542" not found in Level 1 cache
    [junit] 2021-01-28 19:36:20,481 DEBUG [main] DataNucleus.Cache - Object with id "xxx.yyy.OperationalEvents:101542" not found in Level 2 cache
    [junit] 2021-01-28 19:36:20,481 DEBUG [main] DataNucleus.Cache - Object "xxx.yyy.OperationalEvents@394851d7" (id="xxx.yyy.OperationalEvents:101542") added to Level 1 cache (loadedFlags="[NNNNNNN]")
    [junit] 2021-01-28 19:36:20,481 DEBUG [main] DataNucleus.Cache - Object "xxx.yyy.OperationalEvents@394851d7" (id="xxx.yyy.OperationalEvents:101542") added to Level 2 cache (fields="[0, 1, 2, 3, 4, 5, 6]", version="")
    [junit] 2021-01-28 19:36:20,482 DEBUG [main] DataNucleus.Persistence - Retrieved object with OID "xxx.yyy.OperationalEvents:101543"
    [junit] 2021-01-28 19:36:20,482 DEBUG [main] DataNucleus.Cache - Object with id "xxx.yyy.OperationalEvents:101543" not found in Level 1 cache
    [junit] 2021-01-28 19:36:20,482 DEBUG [main] DataNucleus.Cache - Object with id "xxx.yyy.OperationalEvents:101543" not found in Level 2 cache


Andy
 

1). A CAST has syntax "(identifier)", as does your use of parentheses with a field (just like with Java). Hence the compiler would need to know all about candidate class, imports etc etc at the point of the generic compile to try to guess which you mean. It currently doesn't hence that is classified as a cast. See the code at this link. You can develop an update to support both if you really want to put "(myBooleanField)" into JDOQL queries.

2). Fetch size is used in extracting results from a query, as shown in the code at this link. Suggest that you debug your usage around that


dosiwelldi
 

Does mapped-by="id" have special semantics? Can a field that is not the primary key not be named "id"? If I have fields named "id" then mapped-by="id" will use the primary key instead of the field for joining.

Example:

<class name="Article" table="Article">
    <datastore-identity strategy="identity" column="ArtID"/>
    <field name="id">
        <column name="ArtItemID" allows-null="true"/>
    </field>
</class>

<class name="ClearingItem" table="ClearingItem">
    <datastore-identity strategy="identity" column="ClitActivityID"/>
    <field name="article">                                             <!-- field of type Article -->
        <column name="ClitItemID" mapped-by="id" allows-null="true"/>
    </field>
</class>

clearingItem.getArticle() will try to join with Article on ArtID instead of ArtItemID.

Query:
SELECT B0.ArtItemName,B0.ArtItemID,B0.ArtPrice,B0.ArtRevenueAccount,B0.ArtERPID,B0.ArtID FROM ClearingItem A0 LEFT OUTER JOIN Article B0 ON A0.ClitItemID = B0.ArtID WHERE A0.ClitActivityID = <3872418>


Andy
 

The mapped-by refers to the FIELD in the other class that is an object of this objects type. Joining is on the PK of this class, always. There are no "special" field names.
JDO does not support "natural" keys, assuming that is what you're expecting it to offer.


dosiwelldi
 

Ah ok, thanks a lot.

I have a script going from Kodo mapping to DN mapping. I could not map the "natural" keys until I tried "mapped-by", then DN was happy with the mapping. So that will not work and I have to correct our DB Schema...