Changed to -PJANA:MAX_RELAUNCH_THREADS=20 (i.e., from 10 to 20) in mcsmear call. Ten files running now with minimal difference between wall and cpu times.
Update: Nope, they failed... Still failing at the mcsmear stage.
On the cluster: 10/10 sl_mu jobs (only 1000 events each) crashed.
Checking output files. One difference is the valueof JANA_CALIB_URL.
On ifarm --> mysql://ccdb_user@hallddb.jlab.org/ccdb [works...]
On cluster --> sqlite:////work/halld/ccdb_sqlite/9/ccdb.sqlite [doesn't work???]
Adding to the sl_mu auger file:
setenv JANA_CALIB_URL mysql://ccdb_user@hallddb.jlab.org/ccdb
Rerunning 10 files.
OK, this seems to be the problem. I have no idea why it is intermittent, but I gave up on learning mysql looooong ago (2006).
Upping to 10k events and rerunning all mc.
No comments:
Post a Comment