Ticket #921 (closed defect: fixed)

Opened 2 years ago

Last modified 19 months ago

Remove CitationInfo and use Citation instead

Reported by: ronald Owned by: russ
Priority: high Milestone:
Component: ambra Version: 0.9-SNAPSHOT
Keywords: Cc:

Description (last modified by pradeep) (diff)

Citations are still being generated by loading the article xml, transforming that into an xml description of CitationInfo, and deserializing that xml description. Almost all of CitationInfo's info is available already in Citation - what's missing (is this really needed?) is the concept of collaborative authors and the marking of primary vs other authors.

Please see #742 for additional description. It is marked as a duplicate of this.

Change History

  Changed 2 years ago by amit

Clean up citation-info caching. Please see r5681.

  Changed 2 years ago by amit

  • milestone set to 0.9.1

  Changed 2 years ago by pradeep

  • description modified (diff)

  Changed 2 years ago by amit

  • owner changed from ronald to rich
  • component changed from topaz to ambra

  Changed 2 years ago by amit

  • owner changed from rich to ronald

  Changed 2 years ago by amit

  • type changed from unassigned to defect

  Changed 23 months ago by amit

  • priority changed from unassigned to high
  • owner changed from ronald to dragisak

Reassigning...

  Changed 22 months ago by dragisak

  • status changed from new to assigned

  Changed 22 months ago by dragisak

Citation class is missing some properties that CitationInfo has:

  1. Title formatting: Foo <italic>bar</italic> needs to be converted to Foo <i>bar</i> the way it is done in XSL (Same issue is in Article class). It can be done either during ingestion or on the fly, during the display.
  2. Journal short name (as in XML: article/front/journal-meta/journal-id[@journal-id-type='nlm-ta']). Journal property seems to be always null.
  3. Authors are listed as UserProfile objects, not as Author objects. UserProfile is missing suffix property.
  4. startPage (article/front/article-meta/elocation-id) is missing in Citation

This will require data migration.

  Changed 22 months ago by amit

The change from <italic>bar</italic> to <i>bar</i> should be done on the fly at display time.

  Changed 22 months ago by dragisak

(In [6692]) Remove CitationInfo? that was generated from article XML by XSLT transformation and use Citation entity from triplet store. Modify article ingestion so it includes new properties in Citation and UserProfile?. Will require data migration. Addresses #921

  Changed 22 months ago by amit

  • owner changed from dragisak to ronald
  • status changed from assigned to new

Reassigning to Ronald to do migration as part of search also.

  Changed 22 months ago by amit

  • owner changed from ronald to pradeep

Assigning migration to Pradeep.

  Changed 22 months ago by dragisak

(In [6741]) Should have been done in r6692. Addresses #921

  Changed 21 months ago by pradeep

(In [6953]) Migrator for ambra v0.9 Citations. See the applicationContext.xml for tuning information. By default this runs in a back-ground thread allowing web-traffic to go thru. The migrations are cache aware and so there is no need to restart peers after the migrations are done.

If 'background' strategy is chosen, tune the txn time outs and blobThrottle to adjust maximum concurrency. The migration requires a write txn from mulgara - and so it is not expected that ingests etc. are attempted till migrations are all complete. Also note that a mulgara txn related error will terminate the migration operation and a restart of ambra is required to continue the rest of the migrations.

If 'background' strategy is not chosen, ambra start up will succeed only when all Citations in the database are migrated.

Addresses #921

  Changed 21 months ago by amit

  • owner changed from pradeep to russ

Re-assigning for testing.

follow-up: ↓ 18   Changed 21 months ago by russ

  • owner changed from russ to pradeep

citation migrator is failing on many articles (21/44 on branch).

pradeep explained on the list that this is due to authors being out of order in mulgara, and that the solution is to reingest by hand.

i don't think it will be possible to reingest 50% of our articles by hand.

perhaps this belongs to dragisa?

in reply to: ↑ 17 ; follow-up: ↓ 19   Changed 21 months ago by pradeep

  • status changed from new to assigned

Replying to russ:

citation migrator is failing on many articles (21/44 on branch). pradeep explained on the list that this is due to authors being out of order in mulgara, and that the solution is to reingest by hand. i don't think it will be possible to reingest 50% of our articles by hand.

Hmm. Something is strange then. The serverbackup.gz that I used for running couple of rounds of migration tests, succeeded on moody.topazproject.org. (Barring just one article that failed because the duplicate citation key). My understanding was that this was a backup of the production data from a few weeks back. But if that is not the case and you suspect 50% of the articles in the production have the author order wrong, that also can be addressed during this migration.

Not sure how 50% of articles have the author order all wrong. Was the article XMLs on production edited after ingestion to correct some of these things - but a re-ingestion was not done?

The current version of Migration is flagging this as an error to bring attention to this so that it can be fixed by the admin - mainly by doing a re-ingest. But if the mismatch is as high as 50% as you say, then the Migrator can be modified to take care of this easily.

in reply to: ↑ 18   Changed 21 months ago by pradeep

  • status changed from assigned to new
  • owner changed from pradeep to russ

Replying to pradeep:

Realized what the confusion was with the 50% number. (See the message on the mailing list) Since this is not a migration issue, assigning back to you to figure out what is going on with the data used in the test.

  Changed 20 months ago by russ

  • status changed from new to closed
  • resolution set to fixed

pretty much confirmed that branch has corrupt data. migrations on stage corpus and clean small corpus did not reproduce this issue.

  Changed 19 months ago by anonymous

  • milestone 0.9.1 deleted

Milestone 0.9.1 deleted

Note: See TracTickets for help on using tickets.