01 02 E 0 W AlphaRegex 01 02 Wprl.korea.ac.kr/~pronto/home/posters/regex-synthesis.pdf ·...

1
고려대학교 정보통신대학 컴퓨터통신공학부 소순범 고려대학교 정보대학 컴퓨터학과 이민아 1. 연구 동기 2. 문제 및 목표 3. 정규식 합성 알고리즘 4. 실험 문제 예제 개수 합성된 정규식 소요 시간 () 속도향상 P N 기본 알고리즘 우리 알고리즘 w오른쪽으로부터 5번째 글자가 1이다. 3 3 (0+1)*1(0+1)(0+1)(0+1)(0+1) 148.0 8.2 18x w최대 개의 0가진다. 8 7 1*0?1*0?1* 425.0 1.2 354x w01번갈아가며 등장한다. 10 11 0?(10)*1? 4073.9 1.6 2546x w있는 0개수는 3으로 나누어 떨어진다. 8 7 (1+01*01*0)* > 7200.0 5.9 n/a w0으로 시작하면 홀수의 길이를 가지고, 1시작하면 짝수의 길이를 가진다. 5 3 (0+1(0+1)) ((0+1)(0+1))* > 7200.0 10.9 n/a w최소 1개의 0최대 1개의 1가진다. 12 10 0*(01?+100*) > 7200.0 7.5 n/a w최대 1쌍의 연속한 1가진다. 9 8 (1+(01?)*)(0+10*) 465.1 24.4 19x 실험 환경 MacBook Pro / OS X El Capitan 10.11.1 / 2.2 GHz Intel Core i7 / 16GB 1600 MHz DDR3 5. 결론 적은 수의 예로부터 사람도 풀기 어려워 하는 정규식을 빠르게 합성 효율적으로 상태공간을 탐색하기 위한 다양한 탐색 기법 제시 실제 계산이론 책에 등장하는 고난이도 문제통해 성능 입증 고려대학교 프로그래밍 연구실 오학주 교수님 Σ = {0, 1} 에 대해, 다음 언어에 대한 정규식을 찾으시오. L = {w {0, 1}* | w 정확히 한 쌍의 연속인 0들을 갖는다.} 옳은 예 00, 1001, 0101001010 1111001111 틀린 예 01, 11, 000, 00100 3. 해를 가질수 없는 상태(Dead States) 가지치기 2. 같은 의미 상태(Semantically-Equivalent States) 가지치기 1. 간단한 정규식 우선탐색 (Best-first Enumerative Search) 정규표현식 자동 합성기 00, 1001, 0101001010 1111001111 01, 11, 000, 00100 정규식 문법으로 생성되는 모든 상태공간 탐색 기본 알고리즘 챌린지 매우 상태공간. 깊이 d있는 상태 개수: 해결 방법 효율적인 공간 탐색 알고리즘 고안 4. 불필요한 상태(Redundant States) 가지치기 계산이론 수업을 듣다가: 정규식 합성을 자동으로 없을까? 주어진 예제를 만족하는 정규식을 자동 합성하기: AlphaRegex AlphaRegex (0?1) 00(10?) 목표: 계산이론 수강생과 교수님보다 똑똑하게! 옳은 (Positive examples) 틀린 (Negative examples) 자동 합성된 정규표현식 (in 0.5s) a ; + a + a a + a + ; a +(+ ) a +(a + a) a +(a + ) a +(a + ;) ··· a +(· ) ··· a +() ··· + a + + ; +(+ ) ··· ··· · ··· ··· Lemma 4. Let s be any state. Then, pdead(s) () 9p 2 P .p 62 [[b s]]. Proof. Consider each direction. (= )) Suppose pdead(s) holds: s ! s 0 ^ s 0 6! = )9p 2 P .p 62 [[s 0 ]]. (5) From (5) and Lemma 6, we obtain 9p 2 P .p 62 [[b s]]. (( =) Suppose p 62 [[b s]]. By Lemma 2, we have p 62 [ s! s 0 ^s 0 6! [[s 0 ]]. which implies that p 62 [[s 0 ]] for all closed s 0 reachable from s. Lemma 5. Let s be any state. Then, ndead(s) () 9n 2 N .n 2 [[e s]]. Proof. Consider each direction. (= )) Suppose ndead(s) holds: s ! s 0 ^ s 0 6! = )9n 2 N .n 2 [[s 0 ]]. (6) From (6) and Lemma 7, we obtain 9n 2 N .n 2 [[e s]]. (( =) Suppose n 2 [[e s]]. By Lemma 3, we have n 2 \ s! s 0 ^s 0 6! [[s 0 ]] which implies that n 2 [[s 0 ]] for all closed s 0 reachable from s. Definition 1 (Dead States). Let (P , N ) be a regular expres- sion problem. We say a state s 2 S is dead, denoted dead(s), iff every closed state s 0 reachable from s is not a solution: dead(s) () ( (s ! s 0 ) ^ s 0 6! = ) ¬solution(s 0 ) ) . (b 2 P ) . . . a · . . . . . . . . . (a 2 N ) . . . a · () . . . . . . . . . v (aab 2 P ) . . . a · (b + ) · . . . . . . . . . C (a)= C ()= C (;) = c 1 C () = c 2 (c 2 >c 1 ) C (e 1 + e 2 ) > C (e 1 )+ C (e 2 ) C (e 1 · e 2 ) > C (e 1 )+ C (e 2 ) C (e ) > C (e) [[s s ]] = [[s ]] [[(s + s)]] = [[s]] [[(s · s ) ]] = [[s ]] . 엄밀한 이론에 기반 고안한 가지치기 기법들은 프로그래밍 언어 이론에 기반하여 결과의 안전성(Soundness)보장. O (7 2 d -1 ) e e ! a 2 | | ; | e 1 + e 2 | e 1 · e 2 | e 840 lines in OCaml 학생들이 어려워하는 정규식 문제위주로 탐색 기법을 하나도 적용하지 않은 기본 알고리즘을 비교군으로 탐색 기법을 모두 적용한 알고리즘의 성능 향상폭 측정

Transcript of 01 02 E 0 W AlphaRegex 01 02 Wprl.korea.ac.kr/~pronto/home/posters/regex-synthesis.pdf ·...

Page 1: 01 02 E 0 W AlphaRegex 01 02 Wprl.korea.ac.kr/~pronto/home/posters/regex-synthesis.pdf · 2020-01-30 · principle of Ockham’s razor, so that the solution found is the simplest

고려대학교 정보통신대학 컴퓨터통신공학부 소순범고려대학교 정보대학 컴퓨터학과 이민아

1.����������� ������������������  연구����������� ������������������  동기 2.����������� ������������������  문제����������� ������������������  및����������� ������������������  목표

3.����������� ������������������  정규식����������� ������������������  합성����������� ������������������  알고리즘

4.����������� ������������������  실험문제

예제 개수합성된 정규식

소요 시간 (초)속도향상

P N 기본 알고리즘 우리 알고리즘

w는 오른쪽으로부터 5번째 글자가 1이다. 3 3 (0+1)*1(0+1)(0+1)(0+1)(0+1) 148.0 8.2 18x

w는 최대 두 개의 0을 가진다. 8 7 1*0?1*0?1* 425.0 1.2 354x

w는 0과 1이 번갈아가며 등장한다. 10 11 0?(10)*1? 4073.9 1.6 2546x

w에 있는 0의 개수는 3으로 나누어 떨어진다. 8 7 (1+01*01*0)* > 7200.0 5.9 n/a

w가 0으로 시작하면 홀수의 길이를 가지고, 1로 시작하면 짝수의 길이를 가진다. 5 3 (0+1(0+1)) ((0+1)(0+1))* > 7200.0 10.9 n/a

w는 최소 1개의 0과 최대 1개의 1을 가진다. 12 10 0*(01?+100*) > 7200.0 7.5 n/a

w는 최대 1쌍의 연속한 1을 가진다. 9 8 (1+(01?)*)(0+10*) 465.1 24.4 19x

실험����������� ������������������  환경����������� ������������������  ����������� ������������������  ����������� ������������������  MacBook����������� ������������������  Pro����������� ������������������  /����������� ������������������  OS����������� ������������������  X����������� ������������������  El����������� ������������������  Capitan����������� ������������������  10.11.1����������� ������������������  /����������� ������������������  2.2����������� ������������������  GHz����������� ������������������  Intel����������� ������������������  Core����������� ������������������  i7����������� ������������������  /����������� ������������������  16GB����������� ������������������  1600����������� ������������������  MHz����������� ������������������  DDR3

5.����������� ������������������  결론✓ 적은 수의 예로부터 사람도 풀기 어려워 하는 정규식을 빠르게 합성

✓ 효율적으로 상태공간을 탐색하기 위한 다양한 탐색 기법 제시

✓ 실제 계산이론 책에 등장하는 고난이도 문제를 통해 성능 입증

고려대학교 프로그래밍 연구실오학주 교수님

Σ = {0,����������� ������������������  1}����������� ������������������  에����������� ������������������  대해,����������� ������������������  다음����������� ������������������  언어에����������� ������������������  대한����������� ������������������  정규식을����������� ������������������  찾으시오.L = {w ∈ {0, 1}* | w 는����������� ������������������  정확히����������� ������������������  한����������� ������������������  쌍의����������� ������������������  연속인����������� ������������������  0들을����������� ������������������  갖는다.}

옳은����������� ������������������  예����������� ������������������  

00,����������� ������������������  1001,����������� ������������������  

0101001010����������� ������������������  1111001111

틀린����������� ������������������  예����������� ������������������  

01,����������� ������������������  11,����������� ������������������  000,����������� ������������������  00100

효율적인 상태공간 가지치기

3. 해를 가질수 없는 상태(Dead States) 가지치기

2. 같은 의미 상태(Semantically-Equivalent States) 가지치기

1. 간단한 정규식 우선탐색 (Best-first Enumerative Search)

정규표현식 자동 합성기

00,����������� ������������������  1001,����������� ������������������  

0101001010����������� ������������������  1111001111

01,����������� ������������������  11,����������� ������������������  000,����������� ������������������  00100

정규식 문법으로 생성되는 모든 상태공간 탐색기본 알고리즘 챌린지 매우 큰 상태공간. 깊이 d에 있는 상태 개수:

해결 방법 효율적인 공간 탐색 알고리즘 고안

4. 불필요한 상태(Redundant States) 가지치기

계산이론 수업을 듣다가: 정규식 합성을 자동으로 할 수 없을까? 주어진 예제를 만족하는 정규식을 자동 합성하기:

AlphaRegex

AlphaRegex

(0?1)⇤00(10?)⇤

목표: 계산이론 수강생과 교수님보다 똑똑하게!

옳은 예(Positive examples)

틀린 예(Negative examples)

자동 합성된 정규표현식

(in 0.5s)

a ✏ ; ⇤+⇤

a+ a a+ ✏ a+ ; a+ (⇤+⇤)

a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·

a+ (⇤ ·⇤)

· · ·

a+ (⇤⇤)

· · ·

✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)

· · ·

· · ·

⇤ ·⇤· · ·

⇤⇤

· · ·

Figure 1. Exhaustive Search

e1 ! e01e1 + e2 ! e01 + e2

e2 ! e02e1 + e2 ! e1 + e02

e1 ! e01e1 · e2 ! e01 · e2

e2 ! e02e1 · e2 ! e1 · e02

e ! e0

e⇤ ! e0⇤e ! e0

e? ! e0?

⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;

⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?

Figure 2. Transition Relation between States

regular expression (e.g. c = 7). The number of states atdepth d in worst case is

N(0) = 1

N(d+ 1) = N(d) · c2d

when c = 7:

N(d) = 7Pd�1

k=0 2k 2 O(72d�1)

Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:

C(a) = C(✏) = C(;) = c1C(⇤) = c2 (c2 > c1)

C(e1 + e2) > C(e1) + C(e2)C(e1 · e2) > C(e1) + C(e2)

C(e⇤) > C(e)

Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.

Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )

1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;

3.2 NormalizationExamples:

[[s⇤s⇤]] = [[s⇤]]

[[(s+ s)]] = [[s]]

[[(s · s⇤)⇤]] = [[s⇤]]

...

3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:

dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)

�.

Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.

Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:

pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]

�.

Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;

3 2016/6/6

Lemma 4. Let s be any state. Then,

pdead(s) () 9p 2 P. p 62 [[bs]].

Proof. Consider each direction.

• (=)) Suppose pdead(s) holds:

s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)

From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].• ((=) Suppose p 62 [[bs]]. By Lemma 2, we have

p 62[

s!⇤s0^s0 6![[s0]].

which implies that p 62 [[s0]] for all closed s0 reachablefrom s.

Lemma 5. Let s be any state. Then,

ndead(s) () 9n 2 N . n 2 [[es]].

Proof. Consider each direction.

• (=)) Suppose ndead(s) holds:

s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]. (6)

From (6) and Lemma 7, we obtain 9n 2 N . n 2 [[es]].• ((=) Suppose n 2 [[es]]. By Lemma 3, we have

n 2\

s!⇤s0^s0 6![[s0]]

which implies that n 2 [[s0]] for all closed s0 reachablefrom s.

Lemma 6. For any state s, we have s !⇤ bs and bs 6!.

Proof. By structural induction on s.

Lemma 7. For any state s, we have s !⇤ es and es 6!.

Proof. By structural induction on s.

Final Algorithm With normalization and pruning, thesearch algorithm uses the following next function:

next(s) =

8<

:

; 9p 2 P. p 62 [[bs]]; 9n 2 N . n 2 [[es]]{normalize(s0) | s ! s0} otherwise

5 2016/6/4

Lemma 4. Let s be any state. Then,

pdead(s) () 9p 2 P. p 62 [[bs]].

Proof. Consider each direction.

• (=)) Suppose pdead(s) holds:

s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)

From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].• ((=) Suppose p 62 [[bs]]. By Lemma 2, we have

p 62[

s!⇤s0^s0 6![[s0]].

which implies that p 62 [[s0]] for all closed s0 reachablefrom s.

Lemma 5. Let s be any state. Then,

ndead(s) () 9n 2 N . n 2 [[es]].

Proof. Consider each direction.

• (=)) Suppose ndead(s) holds:

s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]. (6)

From (6) and Lemma 7, we obtain 9n 2 N . n 2 [[es]].• ((=) Suppose n 2 [[es]]. By Lemma 3, we have

n 2\

s!⇤s0^s0 6![[s0]]

which implies that n 2 [[s0]] for all closed s0 reachablefrom s.

Lemma 6. For any state s, we have s !⇤ bs and bs 6!.

Proof. By structural induction on s.

Lemma 7. For any state s, we have s !⇤ es and es 6!.

Proof. By structural induction on s.

Final Algorithm With normalization and pruning, thesearch algorithm uses the following next function:

next(s) =

8<

:

; 9p 2 P. p 62 [[bs]]; 9n 2 N . n 2 [[es]]{normalize(s0) | s ! s0} otherwise

5 2016/6/4

a ✏ ; ⇤+⇤

a+ a a+ ✏ a+ ; a+ (⇤+⇤)

a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·

a+ (⇤ ·⇤)

· · ·

a+ (⇤⇤)

· · ·

✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)

· · ·

· · ·

⇤ ·⇤· · ·

⇤⇤

· · ·

Figure 1. Exhaustive Search

e1 ! e01e1 + e2 ! e01 + e2

e2 ! e02e1 + e2 ! e1 + e02

e1 ! e01e1 · e2 ! e01 · e2

e2 ! e02e1 · e2 ! e1 · e02

e ! e0

e⇤ ! e0⇤e ! e0

e? ! e0?

⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;

⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?

Figure 2. Transition Relation between States

regular expression (e.g. c = 7). The number of states atdepth d in worst case is

N(0) = 1

N(d+ 1) = N(d) · c2d

when c = 7:

N(d) = 7Pd�1

k=0 2k 2 O(72d�1)

Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:

C(a) = C(✏) = C(;) = c1C(⇤) = c2 (c2 > c1)

C(e1 + e2) > C(e1) + C(e2)C(e1 · e2) > C(e1) + C(e2)

C(e⇤) > C(e)

Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.

Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )

1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;

3.2 NormalizationExamples:

[[s⇤s⇤]] = [[s⇤]]

[[(s+ s)]] = [[s]]

[[(s · s⇤)⇤]] = [[s⇤]]

...

3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:

dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)

�.

Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.

Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:

pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]

�.

Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;

3 2016/6/6

효율적인 상태공간 탐색기법• Pruning dead states: 탐색을 아무리 진행해도 해를 가질수 없는 상태공간은 탐색하지 않음

no matter how the hole gets instantiated, the string b cannotbe accepted.

(b 2 P)

...

a ·⇤

......

...

(a 2 N )

...

a · (⇤)⇤

......

...

Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:

ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]

�.

Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.

It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:

Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)

�=) dead(s).

Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.

Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.

We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.

Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:

ba = ab✏ = ✏b; = ;

\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2

be⇤ = (be)⇤b⇤ = (a+ b)⇤

ea = ae✏ = ✏e; = ;

e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2

ee⇤ = (ee)⇤e⇤ = ;

Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.

Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).

bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0

reachable from s subsumes the language of es (Lemma 3).

Lemma 2. For any state s, we have

[[bs]] ◆[

s!⇤s0^s0 6![[s0]].

Proof. Todo

Lemma 3. for any state s, we have

[[es]] ✓\

s!⇤s0^s0 6![[s0]].

Proof. Todo

Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:

9p 2 P. p 62 [[bs]] (3)

and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:

9n 2 N . n 2 [[es]]. (4)

Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.

Lemma 4. Let s be any state. Then,

pdead(s) () 9p 2 P. p 62 [[bs]].

Proof. Consider each direction.

• (=)) Suppose pdead(s) holds:

s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)

From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].

4 2016/6/4

no matter how the hole gets instantiated, the string b cannotbe accepted.

(b 2 P)

...

a ·⇤

......

...

(a 2 N )

...

a · (⇤)⇤

......

...

Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:

ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]

�.

Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.

It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:

Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)

�=) dead(s).

Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.

Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.

We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.

Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:

ba = ab✏ = ✏b; = ;

\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2

be⇤ = (be)⇤b⇤ = (a+ b)⇤

ea = ae✏ = ✏e; = ;

e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2

ee⇤ = (ee)⇤e⇤ = ;

Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.

Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).

bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0

reachable from s subsumes the language of es (Lemma 3).

Lemma 2. For any state s, we have

[[bs]] ◆[

s!⇤s0^s0 6![[s0]].

Proof. Todo

Lemma 3. for any state s, we have

[[es]] ✓\

s!⇤s0^s0 6![[s0]].

Proof. Todo

Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:

9p 2 P. p 62 [[bs]] (3)

and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:

9n 2 N . n 2 [[es]]. (4)

Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.

Lemma 4. Let s be any state. Then,

pdead(s) () 9p 2 P. p 62 [[bs]].

Proof. Consider each direction.

• (=)) Suppose pdead(s) holds:

s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)

From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].

4 2016/6/4

• Pruning redundant states: 해를 가질 수 있더라도 다른 곳에 더 간단한 해가 존재하는 상태공간은 탐색하지 않음

no matter how the hole gets instantiated, the string b cannotbe accepted.

(b 2 P)

...

a ·⇤

......

...

(a 2 N )

...

a · (⇤)⇤

......

...

(aab 2 P)

...

a · (b+ ✏) ·⇤

......

...

Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:

ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]

�.

Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.

It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:

Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)

�=) dead(s).

Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.

Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.

We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.

Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:

ba = ab✏ = ✏b; = ;

\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2

be⇤ = (be)⇤b⇤ = (a+ b)⇤

ea = ae✏ = ✏e; = ;

e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2

ee⇤ = (ee)⇤e⇤ = ;

Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.

Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).

bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0

reachable from s subsumes the language of es (Lemma 3).

Lemma 2. For any state s, we have

[[bs]] ◆[

s!⇤s0^s0 6![[s0]].

Proof. Todo

Lemma 3. for any state s, we have

[[es]] ✓\

s!⇤s0^s0 6![[s0]].

Proof. Todo

Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:

9p 2 P. p 62 [[bs]] (3)

and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:

9n 2 N . n 2 [[es]]. (4)

Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.

4 2016/6/4

효율적인 상태공간 탐색기법• Pruning dead states: 탐색을 아무리 진행해도 해를 가질수 없는 상태공간은 탐색하지 않음

no matter how the hole gets instantiated, the string b cannotbe accepted.

(b 2 P)

...

a ·⇤

......

...

(a 2 N )

...

a · (⇤)⇤

......

...

Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:

ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]

�.

Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.

It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:

Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)

�=) dead(s).

Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.

Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.

We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.

Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:

ba = ab✏ = ✏b; = ;

\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2

be⇤ = (be)⇤b⇤ = (a+ b)⇤

ea = ae✏ = ✏e; = ;

e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2

ee⇤ = (ee)⇤e⇤ = ;

Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.

Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).

bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0

reachable from s subsumes the language of es (Lemma 3).

Lemma 2. For any state s, we have

[[bs]] ◆[

s!⇤s0^s0 6![[s0]].

Proof. Todo

Lemma 3. for any state s, we have

[[es]] ✓\

s!⇤s0^s0 6![[s0]].

Proof. Todo

Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:

9p 2 P. p 62 [[bs]] (3)

and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:

9n 2 N . n 2 [[es]]. (4)

Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.

Lemma 4. Let s be any state. Then,

pdead(s) () 9p 2 P. p 62 [[bs]].

Proof. Consider each direction.

• (=)) Suppose pdead(s) holds:

s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)

From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].

4 2016/6/4

no matter how the hole gets instantiated, the string b cannotbe accepted.

(b 2 P)

...

a ·⇤

......

...

(a 2 N )

...

a · (⇤)⇤

......

...

Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:

ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]

�.

Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.

It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:

Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)

�=) dead(s).

Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.

Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.

We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.

Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:

ba = ab✏ = ✏b; = ;

\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2

be⇤ = (be)⇤b⇤ = (a+ b)⇤

ea = ae✏ = ✏e; = ;

e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2

ee⇤ = (ee)⇤e⇤ = ;

Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.

Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).

bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0

reachable from s subsumes the language of es (Lemma 3).

Lemma 2. For any state s, we have

[[bs]] ◆[

s!⇤s0^s0 6![[s0]].

Proof. Todo

Lemma 3. for any state s, we have

[[es]] ✓\

s!⇤s0^s0 6![[s0]].

Proof. Todo

Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:

9p 2 P. p 62 [[bs]] (3)

and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:

9n 2 N . n 2 [[es]]. (4)

Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.

Lemma 4. Let s be any state. Then,

pdead(s) () 9p 2 P. p 62 [[bs]].

Proof. Consider each direction.

• (=)) Suppose pdead(s) holds:

s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)

From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].

4 2016/6/4

• Pruning redundant states: 해를 가질 수 있더라도 다른 곳에 더 간단한 해가 존재하는 상태공간은 탐색하지 않음

no matter how the hole gets instantiated, the string b cannotbe accepted.

(b 2 P)

...

a ·⇤

......

...

(a 2 N )

...

a · (⇤)⇤

......

...

(aab 2 P)

...

a · (b+ ✏) ·⇤

......

...

Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:

ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]

�.

Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.

It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:

Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)

�=) dead(s).

Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.

Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.

We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.

Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:

ba = ab✏ = ✏b; = ;

\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2

be⇤ = (be)⇤b⇤ = (a+ b)⇤

ea = ae✏ = ✏e; = ;

e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2

ee⇤ = (ee)⇤e⇤ = ;

Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.

Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).

bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0

reachable from s subsumes the language of es (Lemma 3).

Lemma 2. For any state s, we have

[[bs]] ◆[

s!⇤s0^s0 6![[s0]].

Proof. Todo

Lemma 3. for any state s, we have

[[es]] ✓\

s!⇤s0^s0 6![[s0]].

Proof. Todo

Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:

9p 2 P. p 62 [[bs]] (3)

and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:

9n 2 N . n 2 [[es]]. (4)

Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.

4 2016/6/4

a ✏ ; ⇤+⇤

a+ a a+ ✏ a+ ; a+ (⇤+⇤)

a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·

a+ (⇤ ·⇤)

· · ·

a+ (⇤⇤)

· · ·

✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)

· · ·

· · ·

⇤ ·⇤· · ·

⇤⇤

· · ·

Figure 1. Exhaustive Search

e1 ! e01e1 + e2 ! e01 + e2

e2 ! e02e1 + e2 ! e1 + e02

e1 ! e01e1 · e2 ! e01 · e2

e2 ! e02e1 · e2 ! e1 · e02

e ! e0

e⇤ ! e0⇤e ! e0

e? ! e0?

⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;

⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?

Figure 2. Transition Relation between States

regular expression (e.g. c = 7). The number of states atdepth d in worst case is

N(0) = 1

N(d+ 1) = N(d) · c2d

when c = 7:

N(d) = 7Pd�1

k=0 2k 2 O(72d�1)

Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:

C(a) = C(✏) = C(;) = c1C(⇤) = c2 (c2 > c1)

C(e1 + e2) > C(e1) + C(e2)C(e1 · e2) > C(e1) + C(e2)

C(e⇤) > C(e)

Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.

Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )

1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;

3.2 NormalizationExamples:

[[s⇤s⇤]] = [[s⇤]]

[[(s+ s)]] = [[s]]

[[(s · s⇤)⇤]] = [[s⇤]]

...

3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:

dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)

�.

Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.

Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:

pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]

�.

Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;

3 2016/6/4

a ✏ ; ⇤+⇤

a+ a a+ ✏ a+ ; a+ (⇤+⇤)

a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·

a+ (⇤ ·⇤)

· · ·

a+ (⇤⇤)

· · ·

✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)

· · ·

· · ·

⇤ ·⇤· · ·

⇤⇤

· · ·

Figure 1. Exhaustive Search

e1 ! e01e1 + e2 ! e01 + e2

e2 ! e02e1 + e2 ! e1 + e02

e1 ! e01e1 · e2 ! e01 · e2

e2 ! e02e1 · e2 ! e1 · e02

e ! e0

e⇤ ! e0⇤e ! e0

e? ! e0?

⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;

⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?

Figure 2. Transition Relation between States

for regular expression (e.g. c = 7). The number of states atdepth d in worst case is

N(0) = 1

N(d+ 1) = N(d) · c2d

when c = 7:

N(d) = 7Pd�1

k=0 2k 2 O(72d�1)

Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:

C(a) = C(✏) = C(;) = 1C(e1 + e2) = C(e1) + C(e2) + 5C(e1 · e2) = C(e1) + C(e2) + 5

C(e⇤) = C(e) + 5C(⇤) = 10

Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.

Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )

1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;

3.2 NormalizationExamples:

[[s⇤s⇤]] = [[s⇤]]

[[(s+ s)]] = [[s]]

[[(s · s⇤)⇤]] = [[s⇤]]

...

3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:

dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)

�.

Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.

Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:

pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]

�.

Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;

3 2016/6/4

엄밀한 이론에 기반

 고안한 가지치기 기법들은 프로그래밍 언어 이론에 기반하여 결과의 안전성(Soundness)을 보장.

정규식 합성 알고리즘

• 기본 알고리즘: 정규식 문법으로 생성되는 모든 상태공간을 탐색

a ✏ ; ⇤+⇤

a+ a a+ ✏ a+ ; a+ (⇤+⇤)

a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·

a+ (⇤ ·⇤)

· · ·

a+ (⇤⇤)

· · ·

✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)

· · ·

· · ·

⇤ ·⇤· · ·

⇤⇤

· · ·

Figure 1. search space

e1 ! e01e1 + e2 ! e01 + e2

e2 ! e02e1 + e2 ! e1 + e02

e1 ! e01e1 · e2 ! e01 · e2

e2 ! e02e1 · e2 ! e1 · e02

e ! e0

e⇤ ! e0⇤e ! e0

e? ! e0?

⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;

⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?

Figure 2. Transition Relation between States

Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )

1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;

Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:

C(a) = C(✏) = C(;) = 1C(e1 + e2) = C(e1) + C(e2) + 5C(e1 · e2) = C(e1) + C(e2) + 5

C(e⇤) = C(e) + 5C(⇤) = 10

Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.

3.2 NormalizationExamples:

s⇤s⇤ ! s⇤

(s+ s) ! s

(s · s⇤)⇤ ! s⇤

...

3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:

dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)

�.

Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.

Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:

pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]

�.

Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;no matter how the hole gets instantiated, the string b cannotbe accepted.

Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:

ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]

�.

Example 2. Suppose a 2 N . Any closed state s0 reach-able from state s = a(⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.

It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:

3 2016/6/4

Challenge: 매우 큰 상태공간

a ✏ ; ⇤+⇤

a+ a a+ ✏ a+ ; a+ (⇤+⇤)

a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·

a+ (⇤ ·⇤)

· · ·

a+ (⇤⇤)

· · ·

✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)

· · ·

· · ·

⇤ ·⇤· · ·

⇤⇤

· · ·

Figure 1. Exhaustive Search

e1 ! e01e1 + e2 ! e01 + e2

e2 ! e02e1 + e2 ! e1 + e02

e1 ! e01e1 · e2 ! e01 · e2

e2 ! e02e1 · e2 ! e1 · e02

e ! e0

e⇤ ! e0⇤e ! e0

e? ! e0?

⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;

⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?

Figure 2. Transition Relation between States

for regular expression (e.g. c = 7). The number of states atdepth d in worst case is

N(0) = 1

N(d+ 1) = N(d) · c2d

when c = 7:

N(d) = 7Pd�1

k=0 2k 2 O(72d�1)

Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:

C(a) = C(✏) = C(;) = 1C(e1 + e2) = C(e1) + C(e2) + 5C(e1 · e2) = C(e1) + C(e2) + 5

C(e⇤) = C(e) + 5C(⇤) = 10

Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.

Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )

1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;

3.2 NormalizationExamples:

s⇤s⇤ ! s⇤

(s+ s) ! s

(s · s⇤)⇤ ! s⇤

...

3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:

dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)

�.

Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.

Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:

pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]

�.

Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;

3 2016/6/4

깊이 d에 있는 상태개수:

actively responds to each input by taking only a few secondsfor deriving new regular expressions that reflect the change.

Contributions This paper makes the following contribu-tions:• We present a new synthesis algorithm for synthesizing

regular expressions in realtime from examples. The mainnovelty is the techniques that effectively prune out largesearch space using over- and under-approximations ofregular expressions.

• We evaluate the proposed technique on 30 benchmarkproblems. The results show that our method quickly de-rive regular expressions on all of the benchmarks withinfew seconds.

• We implement the technique in a tool, ALPHAREGEX,and made it publicly available at http://prl.korea.ac.kr/AlphaRegex.

2. Regular Expression Problems2.1 Regular ExpressionsIntroductory textbooks on automata theory [? ? ? ] use thefollowing syntax for regular expressions:

e ! a 2 ⌃ | ✏ | ; | e1 + e2 | e1 · e2 | e⇤ (1)

A symbol a from an alphabet ⌃, the empty string ✏, and theempty language ;, constitute the primitive regular expres-sions. The remaining cases are inductively defined. Givenregular expressions e1 and e2, we can construct regular ex-pressions by taking the union e1 + e2 or the concatenatione1 · e2. e⇤ denotes the Kleene closure of e. In the introduc-tory courses, the alphabet is typically assumed to be binary;we assume ⌃ = {a, b} in the rest of this paper.

Formally, a regular expression e denotes a language (i.e.a set of strings). We write [[e]] ✓ ⌃⇤ for the language that edenotes, which is inductively defined as follows:

[[a]] = {a}[[✏]] = {✏}[[;]] = ;

[[e1 + e2]] = [[e1]] [ [[e2]][[e1 · e2]] = [[e1]][[e2]]

[[e⇤]] = [[e]]⇤

2.2 Regular Expression ProblemsIn a regular expression problem, students are given with adescription of a regular language L. We assume that the de-scription of a language is given by a pair (P,N ) of examplestrings, where P ✓ ⌃⇤ is a set of positive examples thatmust be included in the language and N ✓ ⌃⇤ is a set ofnegative examples that must be excluded from the language.Given (P,N ), the regular expression problem asks studentsto find a regular expression e that is consistent with the givenexamples:

8p 2 P.p 2 [[e]] ^ 8n 2 N .n 62 [[e]].

3. Our Synthesis Algorithm3.1 Basic Search AlgorithmSuppose a regular expression problem (P,N ) is given. Weformulate this problem as a search problem and present anefficient algorithm to find a solution. The search problem isdefined by a transition system (S,!, I, F ), where S is theset of states, (!) ✓ S ⇥ S is a transition relation, I 2 S isan initial state, and F ✓ S is a set of final, solution states.

• States: A state s 2 S is a partial regular expression thatpossibly has holes (⇤). A hole is a placeholder that canbe replaced by another regular expression. The set S ofstates is inductively defined as follows:

s ! a 2 ⌃ | ✏ | ; | s1 + s2 | s1 · s2 | s⇤ | ⇤ (2)

Note that a state has multiple holes. For example, (a +(⇤ ·⇤))⇤ is a state which has two holes in it.

• Initial State: The initial state is a single hole, i.e., I = ⇤.• Transition Relation: The transition relation (!) ✓ S ⇥S determines the next states of a given state. The transi-tion relation ! is inductively defined as a set of inferencerules in Figure 2. For example, (a+⇤)⇤ ! (a+(⇤·⇤))⇤

because we can find a derivation according to the infer-ence rules as follows:

⇤ ! ⇤ ·⇤(a+⇤) ! (a+ (⇤ ·⇤))

(a+⇤)⇤ ! (a+ (⇤ ·⇤))⇤

We write next(s) for the set of all states that follow s:

next(s) = {s0 | s ! s0}.

For example, when ⌃ = {a, b}, next(a + ⇤) = {(a +a)⇤, (a + b)⇤, (a + ✏)⇤, (a + ;)⇤, (a + (⇤ + ⇤))⇤, (a +(⇤ · ⇤))⇤, (a + (⇤⇤))⇤, (a + (⇤?))⇤}. We write s 6! toindicate that s has no next states; that is, s is a closedexpression with no holes.

• Solution States: A state s is a solution state iff s is aclosed expression (i.e., s 6!) and s is consistent with thegiven positive and negative examples:

solution(s) ()s 6! ^ 8p 2 P.p 2 [[s]] ^ 8n 2 N .n 62 [[s]].

Algorithm 1 presents a naive workset algorithm that solvesthe search problem. Initially, the workset consists of theinitial state (line 1). We choose and remove a state s fromthe workset (line 3). If a solution is found, it is returned.Otherwise, we search for the next states of s by adding theminto the workset.

Size of Search Space The maximum number of holes instate at depth d is 2d. The number of next states for a statewith n holes is cn, where c is the number of inductive rules

2 2016/6/4

정규식 합성 알고리즘

• 기본 알고리즘: 정규식 문법으로 생성되는 모든 상태공간을 탐색

a ✏ ; ⇤+⇤

a+ a a+ ✏ a+ ; a+ (⇤+⇤)

a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·

a+ (⇤ ·⇤)

· · ·

a+ (⇤⇤)

· · ·

✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)

· · ·

· · ·

⇤ ·⇤· · ·

⇤⇤

· · ·

Figure 1. search space

e1 ! e01e1 + e2 ! e01 + e2

e2 ! e02e1 + e2 ! e1 + e02

e1 ! e01e1 · e2 ! e01 · e2

e2 ! e02e1 · e2 ! e1 · e02

e ! e0

e⇤ ! e0⇤e ! e0

e? ! e0?

⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;

⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?

Figure 2. Transition Relation between States

Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )

1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;

Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:

C(a) = C(✏) = C(;) = 1C(e1 + e2) = C(e1) + C(e2) + 5C(e1 · e2) = C(e1) + C(e2) + 5

C(e⇤) = C(e) + 5C(⇤) = 10

Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.

3.2 NormalizationExamples:

s⇤s⇤ ! s⇤

(s+ s) ! s

(s · s⇤)⇤ ! s⇤

...

3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:

dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)

�.

Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.

Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:

pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]

�.

Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;no matter how the hole gets instantiated, the string b cannotbe accepted.

Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:

ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]

�.

Example 2. Suppose a 2 N . Any closed state s0 reach-able from state s = a(⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.

It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:

3 2016/6/4

Challenge: 매우 큰 상태공간

a ✏ ; ⇤+⇤

a+ a a+ ✏ a+ ; a+ (⇤+⇤)

a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·

a+ (⇤ ·⇤)

· · ·

a+ (⇤⇤)

· · ·

✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)

· · ·

· · ·

⇤ ·⇤· · ·

⇤⇤

· · ·

Figure 1. Exhaustive Search

e1 ! e01e1 + e2 ! e01 + e2

e2 ! e02e1 + e2 ! e1 + e02

e1 ! e01e1 · e2 ! e01 · e2

e2 ! e02e1 · e2 ! e1 · e02

e ! e0

e⇤ ! e0⇤e ! e0

e? ! e0?

⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;

⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?

Figure 2. Transition Relation between States

for regular expression (e.g. c = 7). The number of states atdepth d in worst case is

N(0) = 1

N(d+ 1) = N(d) · c2d

when c = 7:

N(d) = 7Pd�1

k=0 2k 2 O(72d�1)

Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:

C(a) = C(✏) = C(;) = 1C(e1 + e2) = C(e1) + C(e2) + 5C(e1 · e2) = C(e1) + C(e2) + 5

C(e⇤) = C(e) + 5C(⇤) = 10

Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.

Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )

1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;

3.2 NormalizationExamples:

s⇤s⇤ ! s⇤

(s+ s) ! s

(s · s⇤)⇤ ! s⇤

...

3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:

dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)

�.

Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.

Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:

pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]

�.

Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;

3 2016/6/4

깊이 d에 있는 상태개수:

actively responds to each input by taking only a few secondsfor deriving new regular expressions that reflect the change.

Contributions This paper makes the following contribu-tions:• We present a new synthesis algorithm for synthesizing

regular expressions in realtime from examples. The mainnovelty is the techniques that effectively prune out largesearch space using over- and under-approximations ofregular expressions.

• We evaluate the proposed technique on 30 benchmarkproblems. The results show that our method quickly de-rive regular expressions on all of the benchmarks withinfew seconds.

• We implement the technique in a tool, ALPHAREGEX,and made it publicly available at http://prl.korea.ac.kr/AlphaRegex.

2. Regular Expression Problems2.1 Regular ExpressionsIntroductory textbooks on automata theory [? ? ? ] use thefollowing syntax for regular expressions:

e ! a 2 ⌃ | ✏ | ; | e1 + e2 | e1 · e2 | e⇤ (1)

A symbol a from an alphabet ⌃, the empty string ✏, and theempty language ;, constitute the primitive regular expres-sions. The remaining cases are inductively defined. Givenregular expressions e1 and e2, we can construct regular ex-pressions by taking the union e1 + e2 or the concatenatione1 · e2. e⇤ denotes the Kleene closure of e. In the introduc-tory courses, the alphabet is typically assumed to be binary;we assume ⌃ = {a, b} in the rest of this paper.

Formally, a regular expression e denotes a language (i.e.a set of strings). We write [[e]] ✓ ⌃⇤ for the language that edenotes, which is inductively defined as follows:

[[a]] = {a}[[✏]] = {✏}[[;]] = ;

[[e1 + e2]] = [[e1]] [ [[e2]][[e1 · e2]] = [[e1]][[e2]]

[[e⇤]] = [[e]]⇤

2.2 Regular Expression ProblemsIn a regular expression problem, students are given with adescription of a regular language L. We assume that the de-scription of a language is given by a pair (P,N ) of examplestrings, where P ✓ ⌃⇤ is a set of positive examples thatmust be included in the language and N ✓ ⌃⇤ is a set ofnegative examples that must be excluded from the language.Given (P,N ), the regular expression problem asks studentsto find a regular expression e that is consistent with the givenexamples:

8p 2 P.p 2 [[e]] ^ 8n 2 N .n 62 [[e]].

3. Our Synthesis Algorithm3.1 Basic Search AlgorithmSuppose a regular expression problem (P,N ) is given. Weformulate this problem as a search problem and present anefficient algorithm to find a solution. The search problem isdefined by a transition system (S,!, I, F ), where S is theset of states, (!) ✓ S ⇥ S is a transition relation, I 2 S isan initial state, and F ✓ S is a set of final, solution states.

• States: A state s 2 S is a partial regular expression thatpossibly has holes (⇤). A hole is a placeholder that canbe replaced by another regular expression. The set S ofstates is inductively defined as follows:

s ! a 2 ⌃ | ✏ | ; | s1 + s2 | s1 · s2 | s⇤ | ⇤ (2)

Note that a state has multiple holes. For example, (a +(⇤ ·⇤))⇤ is a state which has two holes in it.

• Initial State: The initial state is a single hole, i.e., I = ⇤.• Transition Relation: The transition relation (!) ✓ S ⇥S determines the next states of a given state. The transi-tion relation ! is inductively defined as a set of inferencerules in Figure 2. For example, (a+⇤)⇤ ! (a+(⇤·⇤))⇤

because we can find a derivation according to the infer-ence rules as follows:

⇤ ! ⇤ ·⇤(a+⇤) ! (a+ (⇤ ·⇤))

(a+⇤)⇤ ! (a+ (⇤ ·⇤))⇤

We write next(s) for the set of all states that follow s:

next(s) = {s0 | s ! s0}.

For example, when ⌃ = {a, b}, next(a + ⇤) = {(a +a)⇤, (a + b)⇤, (a + ✏)⇤, (a + ;)⇤, (a + (⇤ + ⇤))⇤, (a +(⇤ · ⇤))⇤, (a + (⇤⇤))⇤, (a + (⇤?))⇤}. We write s 6! toindicate that s has no next states; that is, s is a closedexpression with no holes.

• Solution States: A state s is a solution state iff s is aclosed expression (i.e., s 6!) and s is consistent with thegiven positive and negative examples:

solution(s) ()s 6! ^ 8p 2 P.p 2 [[s]] ^ 8n 2 N .n 62 [[s]].

Algorithm 1 presents a naive workset algorithm that solvesthe search problem. Initially, the workset consists of theinitial state (line 1). We choose and remove a state s fromthe workset (line 3). If a solution is found, it is returned.Otherwise, we search for the next states of s by adding theminto the workset.

Size of Search Space The maximum number of holes instate at depth d is 2d. The number of next states for a statewith n holes is cn, where c is the number of inductive rules

2 2016/6/4

✓ 840 lines in OCaml

✓ 학생들이 어려워하는 정규식 문제를 위주로

✓ 탐색 기법을 하나도 적용하지 않은 기본 알고리즘을 비교군으로

✓ 탐색 기법을 모두 적용한 알고리즘의 성능 및 향상폭 측정