Gallery
Actual Example 1: Simple example with 2 input strings and a choice of 2 learned regexes (The Optimal Exact Match and 1 Match):
MLREGEX results 2025-03-07 14:16:16 UTC
Your Learn Event ID:
6c34a2f2-0e25-4a65-bbf6-519bfde9238b
Your Learn Event Name/Description:
Coffee and Tea
Your set of input strings (2)
coffee
tea
Learned Regexes (2)
1. MOST OPTIMAL EXACT MATCH
ABSTRACTION TYPE: NONE
EXPANSION FACTOR: 1.0X
SIGNIFICANT LENGTH: 7
cof{2}e{2}|tea
2. MATCH
ABSTRACTION TYPE: Structural
EXPANSION FACTOR: 4.0X
SIGNIFICANT LENGTH: 6
(cof{2}|t)e{1,2}a?
In Example 1, you can see that two Regular Expressions are learned for the two input strings: coffee and tea. The first Regular Expression is the Optimal Exact Match, meaning it the shortest Regular Expression that matches all and only the two input strings.
The second Regular Expression is a structural abstraction, that matches the two input strings, but not only the two input strings. For example, the second Regular Expression also matches the string “coffea”, which is incorrect. Note that although the second Regular Expression has a shorter significant length than the first one, it matches not only the two input strings, but eight strings in total (Expansion Factor * number of input strings). Which Regular Expression you want, depends on your use case.
Actual Example 2 (Text)
Input Strings:
“A regular expression (shortened as regex or regexp;[1] sometimes referred to as rational expression[2][3]) is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.” - Wikipedia
Learned regex:
((s(ometim|pecifi)|techniqu)e|a|i)s|(refer{2}|\(shorten|develop|us)ed|string(-searching|s,)|(validation|text)\.|theor(etical|y\.)|s(equ|ci)ence|(Usual{2}|b)y|(languag|ar)e|(r|R)egular|algorithms|characters|operations|(mat|su)ch|computer|rational|replace\"|o(r|f|n)|t(hat|o)|A|expres{2}ion(\[2]\[3]\))?|regex(p;\[1])?|pat{2}erns?|for(mal)?|in(put)?|\"find\"?|a(nd)?
Actual Example 3 (Optimality, Readability and Abstraction: URLs)
Input Strings (13):
http://1.alpha.com
http://2.alpha.com
http://3.alpha.com
http://4.beta.com
http://5.beta.com
http://6.beta.org
http://7.beta.org
https://1.alpha.com
https://2.alpha.com
https://3.alpha.com
https://4.beta.com
https://5.beta.com
https://6.alpha.org
Learned Regexes (6)
1. MOST OPTIMAL EXACT MATCH
ABSTRACTION TYPE: Digit Ranges 1
EXPANSION FACTOR: 1.0X
SIGNIFICANT LENGTH: 46
ht{2}p(s?:/{2}([1-3]\.alph|[4-5]\.bet)a\.com|(s:/{2}6\.alph|:/{2}[6-7]\.bet)a\.org)
2. EXACT MATCH
ABSTRACTION TYPE: NONE
EXPANSION FACTOR: 1.0X
SIGNIFICANT LENGTH: 47
ht{2}p(s?:/{2}((1|2|3)\.alph|(4|5)\.bet)a\.com|(s:/{2}6\.alph|:/{2}(6|7)\.bet)a\.org)
3. MATCH
ABSTRACTION TYPE: Structural
EXPANSION FACTOR: 8.6X
SIGNIFICANT LENGTH: 28
ht{2}ps?:/{2}(1|2|3|4|5|6|7)\.(alph|bet)a\.c?o(rg|m)
4. MATCH
ABSTRACTION TYPE: Digit Class 1
EXPANSION FACTOR: 3.9X
SIGNIFICANT LENGTH: 40
ht{2}p(s?:/{2}(\d\.alph|\d\.bet)a\.com|(s:/{2}6\.alph|:/{2}\d\.bet)a\.org)
5. MATCH
ABSTRACTION TYPE: Word Class 1
EXPANSION FACTOR: 24.3X
SIGNIFICANT LENGTH: 40
ht{2}p(s?:/{2}(\w\.alph|\w\.bet)a\.com|(s:/{2}6\.alph|:/{2}\w\.bet)a\.org)
6. MATCH
ABSTRACTION TYPE: Character Ranges 2
EXPANSION FACTOR: 76923.1X
SIGNIFICANT LENGTH: 19
ht{2}p[.-/1-7:a-ceg-hl-mo-pr-t]{13,15}
Actual Example 4 (Nested Repeating Substrings)
Input Strings (4):
waabbccddaabbccddr
waabbcffggvcffggvcffggvddaabbccddaabbccddr
waabbcffggffggvcffggffggvcffggffggvddaabbccddaabbccddr
waabbcffgeegeevcffgeegeevcffgeegeevddaabbccddaabbccddr
Learned regex:
w(a{2}b{2}((c(f{2}g{2}){2}v){3}|(cf{2}(ge{2}){2}v){3}|(cf{2}g{2}v){3})d{2})?(a{2}b{2}c{2}d{2}){2}r
Actual Example 5 (Scalability: 50 Random Strings, with lengths between 1 and 100)
Input Strings (50):
Learned regex:
1. MOST OPTIMAL EXACT MATCH
ABSTRACTION TYPE: NONE
EXPANSION FACTOR: 1.0X
SIGNIFICANT LENGTH: 1708
(dc){2}b{3}e{2}bd{2}cdcebce{2}b{2}c{2}ecbecebe{3}b{2}dc{2}bc{2}d{2}e{2}becedebe{2}cebd{2}bc{2}b(ed){2}c{2}bebdcd{4}bcbc{4}de{2}dcdedb{2}|ce{2}ceb{2}dcecbedcbdebd{2}bcbecd{2}becdbdc{2}bc{6}bc{3}e{2}bdc(bc){2}d{2}c{2}d{2}cd{3}bc{2}b{3}dec{2}dbedede{2}becdcedce{2}d|bdecde{2}dbe{2}dceb(ec){2}bcd{2}b{2}dbede{2}dcb{2}cd{2}e{2}dcb{2}(de){2}c{2}b{2}ec{2}dbecedb{2}cb{3}(cde){2}d{2}bcd{2}cecdbe|edecbdb{2}cdebcedbd{2}b{4}e{2}c{3}db{2}(e{2}d){2}bdeb{3}d{2}ebcde{2}ce{4}d{2}(eb){2}db{2}decdeb{2}edebedcb{3}edbebe{2}b{4}c|bcd{2}cbec(ece){2}dbe{3}bc{2}b{2}dcedeb{2}ec{2}be(edb){2}decd(bc){2}cdbdcebce{2}b{2}cdcb(cbde){2}(dc){2}e{2}bc(dc){2}e{3}|edcdbede{3}becb{3}dc{2}edcbec{4}dcecbdbcdc{2}becd{2}bc(dc){3}bdc{2}d{2}cdc{2}(bd){2}edbec(be){3}d{2}bebdcec{4}de{2}bdedc|cde{2}cedbedcbcdbd{2}b{2}db(edb){2}cb{2}de{2}cbecebd{2}c{2}bedeb(dc){3}ecbe{3}dbe{2}bcbedc{2}(e{2}d){2}cbcebe{2}c{3}d{5}|bc{2}ed{4}b{2}cde(dc){2}edebcecd(c{2}e){2}dc{2}eb{2}ed{2}ce{3}c{3}ecbd(ce){2}d{2}b{2}c{2}ebecdbcecdc{2}edbde{2}cecdced|cbdc{2}e{3}debced{3}ecdeb{3}dcdb{2}deced{2}bde{2}cb{2}e{3}c{3}ed{2}cde{2}dceb{2}dedce(c{2}e){2}db{2}cdc{2}b{3}c{3}|b{2}d{3}bed{3}cbedce{2}bcedecdbcd{2}ed{5}cedced{2}bce{2}bdbd{2}bcdbc{2}de(dc){2}e{2}bed(cb){3}dc{3}de{3}(cb{2}){2}|e{2}c{2}bcbeb(cd){2}e{3}cdcbced{2}ede{2}cebcdbd{2}ed{3}c{2}ecdedbebd{4}cecdcd{2}bdbc{2}d{2}bcbdbd{2}ebce{2}bd|d{2}(bd){2}eb{2}dc{2}ebcd{2}e(db){2}d{2}ecd{3}b{2}deb{2}ecbc(db){2}ce{2}b{4}(ebd){2}bcedbc{2}bd{4}bcbdc|de{3}deb{2}db{3}de{2}bec{2}be{2}d{2}ebec{2}edebc{2}d{2}e{2}bce{3}decebedbe{4}c{2}edbe{2}dc{2}b{2}cbdb|bced(ecd{2}){2}cb(bd){2}c{2}d{2}bcbde{3}b{2}c{2}db{2}ecdbdec(eb){2}(bd){2}ce{3}c{2}d{2}b(de){2}bcb|dce{3}c{3}e{3}dcbe{2}cbcd{2}ecb{2}dce{2}bdcedec{2}dbcbe{2}(de){2}c{2}bd(bdb){2}edcecd{2}bcdecd{2}|dcb(de){2}b{2}(ed){2}cbde{3}cbdbcebcd{2}ecbd{2}ed{2}cb{2}debe{2}cbdb{2}e{2}c(cde){2}ecbc{3}db{2}|cbc{3}d{2}cbdb{2}e(ce){3}(eb){2}db{3}d{2}bec{2}ebcd{2}ce{2}dcd(ec{2}){2}(bc){2}c{2}edc{2}dbeb|dbde{3}bcec{2}ed{2}e{2}dbcecbd{2}bdc{2}beb(edc){2}db(bed){2}ce{2}bed{3}c{2}d{2}edbdeb{3}|d{2}cecbe{2}cdbdeb{2}e{4}db{3}c{4}ebecdbedcd{2}bedcded{2}cebec{2}bed{3}ede{2}bdc{2}d{2}|e(db){2}db{2}e(be){2}dc{2}ecb{2}c{2}bde{3}bdbedbdebcbe{2}db{3}dcb{2}ce{2}d{2}ec{4}ecdbd|eb{3}d{2}edb{2}cdedce{5}cdecb{2}decbd{2}b{3}cdecbdcd{2}ecb{2}cbd{2}e{2}c{2}be{2}c|dbe{3}d(ec){2}cedebed(ec){2}d{2}bdcdedbdced{2}ebc{2}b{2}edbe{2}dbdcbdedb{2}dec{2}|(dc){2}b{3}ecdbc{2}e{2}cdbcdc{3}e{2}(bd){2}cbcd(bd){2}ebcdeb(ec{2}){2}b{3}ed|decded{3}bede{3}d{2}cdbc{2}be{3}bcedbe{2}cdc(cd){2}ecedbdcd{2}ec{2}e(cd){2}|c{2}e{2}dcb{2}ecbeb{2}cec{2}dede{4}d{3}b{3}(db){2}ec{2}dcbcdec{2}bdcecdec|dcbd{3}cbcd{2}cdbecede{2}d{2}cedbc(bdc){2}cdced{2}c{2}b(bce){2}e|cbecd{2}cbdec{2}dcedb{2}de{3}be{2}cbebce{3}c{2}d{2}bec{2}bc|ed{2}cdb{3}c{2}db{2}d{2}cbdec{4}e{2}cdce{2}cdc{2}d{2}bdcbd|ecbe{3}c{3}dbecdcbede{3}cbdb(b{2}c){2}e(be){2}e{2}b{2}dceb|dedb{2}cdebcebdbe{2}c{3}ecbc{2}bedc{3}ebdbcb(ce){2}|cdc{3}de{2}cdb(dbc{2}){2}bc{2}de{2}(ec){2}be{2}|dcedebc{2}eb{2}ce{2}dcedb{2}dc{2}e{2}b{2}c|cdc(eb){3}(cb){2}db{2}edcebedeb(ebd){2}b|c{3}d{3}cd{2}bdc(ce){2}ecdcbdeced{3}cde|cdbd{2}ebdbcede{2}d{2}cbedebcdbeb{2}cd|d{3}cdbe(de){2}e{2}dcebd{2}bedce{2}bd|bdcbdbeb{2}cebc{3}dbeb(ec){2}b{2}cdc|cbdb{2}e{2}c{2}d{2}(db){2}b(be){2}e|cbdbc{2}e{2}cb{2}debecdc{2}|c{2}dedbe(bd{3}c){2}cbcedb|e{2}c{2}decbc{2}ed{3}ecb|ced{2}b{3}cd{2}|d{2}cbec{3}b{2}|b{5}ebe{2}dcd|dec{2}bebdbc|cd{2}b{4}|bd{2}|b{2}|dcdb|c