Auto-provision batch folder from KB
scripts/provision_jawal_batch.py · reads /mnt/aljeel-ap_kb/current/<BATCH>/
Reads the Aljeel-prepared workbook (J26-XXX.xlsx) and the Jawal invoice from the KB volume. Builds:
batches/jawal-J26-XXX/ — working dirSpreadsheet.xlsx — Oracle Fusion template populated with description, ticket #, amount, date (NO emp_no or GL combo)raw/ — symlink to the day-folder root with all .msg + OPEX PDFsAfter this stage the pipeline can run blind — pipeline never sees Aljeel's final coding.
Routing decision per row
scripts/run_hybrid_v15_12.py · first-match-wins on 5 triggers
OPEX-*.pdf AND cascade didn't produce clean sponsorship codingHF/CRM-/EP-/AATS/IEPC/SIS-/ISHLT/DDW000000, 999999, or blankTrust deterministic output
~85% of rows on a typical batch
No LLM call. Cascade result becomes the final answer. Travel rows where Manpower fuzzy + Aljeel email anchor resolved the employee cleanly. Family-cluster unification already applied.
Read entire ticket folder
~15% of rows · sponsorships, unresolved, edge cases
For each routed row, builds a multimodal prompt containing:
.msg body in the folder, complete (no truncation)OPEX-*.pdf attached as Gemini file_data (full PDF, no excerpt)Model cascade: gemini-pro-latest (3 Pro) → gemini-2.5-pro → gemini-2.5-flash. Cache keyed by ticket # so re-runs are free.
Returns JSON: emp_no, account, cost_center, div, solution, agency, confidence, reasoning. For sponsorships, the "emp_no" is the requesting employee (not the doctor/guest traveler) — found in OPEX form or approval chain.
Output: Oracle-ingestible xlsx + summary
Spreadsheet-J26-XXX-FILLED-v15.12.xlsx
Saved to batches/jawal-J26-XXX/output/:
Spreadsheet-J26-XXX-FILLED-v15.12.xlsx — the Oracle Fusion upload sheetsummary-v15.12.json — match method breakdown, flag counts, row statuscatches-within-batch.json · catches-cross-batch.jsonIf a ground-truth Details sheet exists for the batch, score_against_truth.py emits a comparison MD next to the output.
General feedback
questions, architectural concerns, ideas