About

This website does the Machine Learning of Non-trivial Regular Expressions.

These learned Regular Expressions are
  1. descriptive,
  2. non-trivial,
  3. optimal, or near optimal,
  4. executable
  5. matching, or exact matching and
  6. learned from positive example input strings.

This has not been done before -- it is a breakthrough in Computer Science and Machine Learning.

This solves the Regular Expression Induction (REI) problem for the first time - in a significant and practical way.

Up to 19 regexes are learned for each input set of strings -- providing a choice between Optimality, Readability and Abstractions.

Definitions

Descriptive means that the original input strings can be reconstructed by examining the learned regex.

Optimal means the shortest regex describing the input string set, based on the Significant Length of the regex.

Executable means that most normal regular expression engines can execute the learned regex.

Matching means that the learned regex matches all the input strings.

Exact Matching means that the learned regex matches all and only the input strings.

Abstractions use Character Classes (\d and \w), as well as computed Character Ranges (e.g. [3-5bg-j]).

Plain Length of a regex means the total number of characters in the regex.

Significant Length of a regex means the number of occurrences of original input string characters, in the regex.

Notes


Microsoft for Startups