GlueX Lambda SLD analysis: June 2019

Saturday, June 29, 2019

Update on GPU on wintermute

NVIDIA driver 430 installed and it's correctly detecting the GPU:

wintermute:~/cuda_test> sudo apt list --installed | grep nvidia

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libnvidia-cfg1-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-common-430/bionic,bionic,now 430.26-0ubuntu0~gpu18.04.1 all [installed,automatic]
libnvidia-compute-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed]
libnvidia-decode-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-encode-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-fbc1-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-gl-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-ifr1-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-compute-utils-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-dkms-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-driver-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed]
nvidia-kernel-common-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-kernel-source-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-prime/bionic-updates,bionic-updates,now 0.8.8.2 all [installed]
nvidia-settings/bionic,now 418.56-0ubuntu0~gpu18.04.1 amd64 [installed]
nvidia-utils-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
xserver-xorg-video-nvidia-430/bionic,now 430.26-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
wintermute:~/cuda_test>

wintermute:~> nvidia-smi
Sat Jun 29 14:10:50 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 Off | 00000000:01:00.0 Off | N/A |
| 40% 27C P8 9W / 160W | 0MiB / 5934MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
wintermute:~>

Looks like I don't need to install CUDA: https://www.pugetsystems.com/labs/hpc/How-To-Install-CUDA-10-together-with-9-2-on-Ubuntu-18-04-with-support-for-NVIDIA-20XX-Turing-GPUs-1236/

NEXT: Install TensorFlow through conda with GPU support.

Tuesday, June 18, 2019

MC running but sporadically crashing...

pim_gam and charged_mix MC is running on the cluster, but crashes happen sporadically. It seems that the recent Slurm update means that log files no longer get written to .farm_output.

Sunday, June 16, 2019

Currently working on raw generator

Started working on raw event generator. Functionality to generate files with physical ratios of the charged decay modes is working, but I somehow broke the part of the code that assigns each particle's decay vertex index and parent particle index....

Update... seems to be working now.

Also working is addition of decay mechanism flag into the data stream. This required changing the raw file structure slightly AND adding this feature to the sld_raw_2_hddm.py code.

This may also need to be added later on so that the mech info stays with the recon files. Will check it out.

BUT! At this point, I think that we're ready to gen some MC!

Generating 500 files (5M events) of pim_gam MC...

And also 500 files worth of charged mixed MC (#yolo). This should produce an expected 1.5 sl_mu events per file for roughly 750 events total (yikes, that's grim) in the raw. Detector effects will knock this down by a factor of 10. So, it's likely that I'll need to generate roughly 100 times more at some point to actually do this.

Thursday, June 13, 2019

It's also time to begin analyzing the L--> p pi- gamma background

BF for muonic sld: 1.57e-4
BF for L-->p pi- gamma: 8.4e-4

Time to generate some MC for the pi gam background and see how much it contaminates.

This might be a situation where a KF to the background reaction could be good for rejection.

Thoughts on generating physical mixed MC

Looking ahead to generating a large MC set that includes both ppim and sl_mu events in roughly physical proportion. Here are some thoughts:

0. This will likely have to be done at the raw generator level. Should see if it will be easy to add mixing... suspect yes, but might take some reworking of the generator and some vetting of the re-toolchain.

1. One could simply make proportional mixture of signal and background at the post-reconstruction stage, but this would assume that detector and reconstruction acceptance is the same for signal and background, which might not be true. Similarly, weighting events post-recon would produce similar results.

2. Need to include which decay mechanism (truth) each event is in the data stream. This can be done with the "mech" tag in the hddm format. Need to add to the raw files, and to the post-recon code.

3. Additionally, we want this mixed data set to be events that the signal-separation models were NOT trained on. Right now, it looks like the models are over-training (more below), so this will be a problem.

A note on model training. With the increased number of input features (many of which are technically redundant) overtraining seems to be more prevalent. This is probably because the drop-out rate (previously 0.1) is no longer as effective at regularization because there are redundant features (for example, I'm now supplying cylindrical coord components for p4's and x4's in addition to the cartesian components). Will have to try increasing the dropout rate!

Tuesday, June 11, 2019

Forgot to add beam E to TTree output

Might want to check that the KF does not affect the beam energy... I read this somewhere, but find it hard to believe. For now, adding the energy from locBeamP4 = dComboBeamWrapper->Get_P4().

Update on TTree to features files...

I have adapted the DSelector that matches proton to the thrown particle ID to output flat trees. This runs at the lab in the /work/halld/gluex/home/mmccrack/dsel2_protonTRUTH directory.

These TTrees have the TLorentzVectors for momenta (measured and kf) and vertex positions (kf) in them, but they ALSO have a separate branch for each component of these vectors. This allows them to play well with the next link in the toolchain (but makes the files larger than necessary).

The next step is to convert these TTree files to csv files that are nicely formatted for Pandas. This happens on my laptop with the script sld_ttree_2_pandas.py. This code removes any features that SHOULD NOT BE FIT (truth/thrown quantities, etc.) and outputs a large-ish csv file. This code uses root_numpy, which doesn't seem to be maintained anymore (built for numpy 1.9, but current version is 1.16???).

These csv files will then be processed with the tf fitting machinery. This code (tf) needs to be modified to work with the different feature names and a much large number of features (95-ish).

NB: It is likely that NNs are not the way to go with these new features. Some of the detector quantities do not standardize well (e.g., FCAL energy) since not all tracks pass through the FCAL. For tracks that don't some large placeholder value is assigned (e.g. 1e4), and this would be difficult for NNs to work with. BDTs might be best. (Grrrrrr.)

For 1e5 sl_mu events, the TTree file is 113 MB and the csv file is 97 MB. Each contains 110948 events. (So assuming event uniqueness, the acceptance to this stage is about 11%.)

Saturday, June 8, 2019

[non-]Deep thought for features...

In addition to cartesian coordinates for momenta and vertices, add perp and phi to the data stream.

Friday, June 7, 2019

Lots of problems easily converting flat trees to pandas objects...

Long story short: I have spent the last several days trying to fix the pyroot build on my machine. Making ROOT build against the Anaconda version of python now seems to be not worth the effort (though I did have it working a few months back). After five days of trying different combinations of ROOT 6.14/16 and python 3.6/3.7, I broke down and installed ROOT 6.16 via conda. Seems to be working well.

The root_numpy module will allow easy conversion from root ttrees to numpy arrays or pandas dfs. HOWEVER, it doesn't seem to work easily if the TTree branches are TLorentzVectors.

Solution: rewrite DSelector so that the TLorentzVectors are instead entered into the tree as flattened components. Remove the TLorentzVectors from the output stream to keep the file sizes smaller. Should take a few hours worth of work.

GlueX Lambda SLD analysis