These TTrees have the TLorentzVectors for momenta (measured and kf) and vertex positions (kf) in them, but they ALSO have a separate branch for each component of these vectors. This allows them to play well with the next link in the toolchain (but makes the files larger than necessary).
The next step is to convert these TTree files to csv files that are nicely formatted for Pandas. This happens on my laptop with the script sld_ttree_2_pandas.py. This code removes any features that SHOULD NOT BE FIT (truth/thrown quantities, etc.) and outputs a large-ish csv file. This code uses root_numpy, which doesn't seem to be maintained anymore (built for numpy 1.9, but current version is 1.16???).
These csv files will then be processed with the tf fitting machinery. This code (tf) needs to be modified to work with the different feature names and a much large number of features (95-ish).
NB: It is likely that NNs are not the way to go with these new features. Some of the detector quantities do not standardize well (e.g., FCAL energy) since not all tracks pass through the FCAL. For tracks that don't some large placeholder value is assigned (e.g. 1e4), and this would be difficult for NNs to work with. BDTs might be best. (Grrrrrr.)
For 1e5 sl_mu events, the TTree file is 113 MB and the csv file is 97 MB. Each contains 110948 events. (So assuming event uniqueness, the acceptance to this stage is about 11%.)
No comments:
Post a Comment