About
This website performs Machine Learning of Non-trivial Optimal Regular Expressions.
These learned Regular Expressions are:
- Descriptive
- Non-trivial
- Optimal or near-optimal
- Executable
- Matching or Exact Matching
- Learned from positive example input strings
This has not been done before—it’s a breakthrough in Computer Science and Machine Learning.
This solves the Regular Expression Induction (REI) problem in a significant and practical way.
Up to 17 regexes are learned for each input set—providing options between Optimality, Readability, and Abstractions.
Definitions
- Descriptive: The input strings can be reconstructed from the learned regex.
- Optimal: The shortest regex based on Significant Length.
- Executable: Compatible with standard regex engines.
- Matching: Matches all input strings.
- Exact Matching: Matches all and only the input strings.
- Abstractions: Use of character classes (e.g.,
\d
,\w
) and ranges (e.g.,[3-5bg-j]
). - Plain Length: Total characters in the regex.
- Significant Length: Count of input string characters in the regex.
- Expansion Factor: (matched strings count) ÷ (original input strings count). For exact matches:
1.0X
.
Notes
-
Purpose:
- As close to optimal as possible
- Executable and readable by humans
- Supports human analysis of string/sequences
- Introduces a new form of explainable machine learning
- Shortest regex determined by Significant Length
-
Significant vs. Plain Length example for input string aab:
Id Regex Significant Length Plain Length 1 aab
3: aab 3: aab 2 a{2}b
2: a{2}b 5: a{2}b - Using Plain Length, shortest is
aab
(3 < 5) - Using Significant Length, shortest is
a{2}b
(2 < 3)
- Using Plain Length, shortest is
- Why Significant Length? It emphasizes structure in the regex.
