Use word-boundary matching for personnel name blocklist

Devin Review correctly flagged that the bare "NO" and "KET" entries
in the blocklist would silently drop common Indonesian names (KETUT,
NOVA, NOOR, NORMAN, NOVIANTI, ...) because the check used startswith
rather than a word boundary.

Replaced the per-prefix loop with a single compiled regex anchored at
^ with a trailing \b, which still matches column headers like "NO"
or "KET" on their own line but no longer rejects "NOOR HIDAYAT" or
"KETUT WARDANA". Also fixes the same bug in _following_jabatan.

Added two regression tests covering both directions: names starting
with the offending tokens are kept, bare column headers still rejected.

Co-Authored-By: adrian kuman firmansah <adriancuman@gmail.com>
This commit is contained in:
Devin AI
2026-04-26 05:46:21 +00:00
parent 58a2bf2648
commit 737f4999dd
2 changed files with 47 additions and 10 deletions

View File

@@ -92,6 +92,37 @@ class TestExtractPersonnelFromText:
text = "Some line\nFAKERANK / 12345678\nanother line"
assert extract_personnel_from_text(text) == []
def test_does_not_drop_indonesian_names_starting_with_no_or_ket(self) -> None:
# Regression: 'NO' / 'KET' are legitimate column header tokens but
# also prefix common Indonesian names (KETUT, NOVA, NOOR). The
# blocklist must use word boundaries, not a raw startswith check.
text = (
"DAFTAR PERSONIL\n"
"1.\n"
"KETUT WARDANA\n"
"AIPTU / 11111111\n"
"JABATAN A\n"
"2.\n"
"NOVA SARI\n"
"BRIPTU / 22222222\n"
"JABATAN B\n"
"3.\n"
"NOOR HIDAYAT\n"
"BRIPDA / 33333333\n"
"JABATAN C\n"
)
rows = extract_personnel_from_text(text)
names = [r.nama for r in rows]
assert names == ["KETUT WARDANA", "NOVA SARI", "NOOR HIDAYAT"]
def test_still_blocks_bare_column_header_tokens(self) -> None:
# Word-boundary fix must still reject the actual column-header
# rows that motivated the blocklist in the first place.
text = "NO\nNAMA\nPANGKAT / NRP\nJABATAN\nKET\n1.\nREAL NAME\nAIPTU / 12345678\n"
rows = extract_personnel_from_text(text)
assert len(rows) == 1
assert rows[0].nama == "REAL NAME"
class TestIsLowQuality:
def test_empty_list_is_low_quality(self) -> None: