L10-Regular Expression Matching
/**
10. Regular Expression Matching
Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.
'.' Matches any single character.
'*' Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).
Note:
s could be empty and contains only lowercase letters a-z.
p could be empty and contains only lowercase letters a-z, and characters like . or *.
Example 1:
Input:
s = "aa"
p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".
Example 2:
Input:
s = "aa"
p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".
Example 3:
Input:
s = "ab"
p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".
Example 4:
mmm
Input:
s = "aab"
p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore, it matches "aab".
Example 5:
Input:
s = "mississippi"
p = "mis*is*p*."
Output: false
* */
参数限制:
Constraints:
1 <= s.length <= 20
1 <= p.length <= 30
s contains only lowercase English letters.
p contains only lowercase English letters, '.', and '*'.
It is guaranteed for each appearance of the character '*', there will be a previous valid character to match.
第一种方法
先看官方的文档中的解法:首先就是递归处理,而且还介绍了一种方法,P中包含了两种字符,一个. 比较好处理,就是任意匹配就完活了,一个有点困难,特别是已经有了一个的情况下,所以从官方的solution中,我们学到了一个方法,我命名为:“假想降级”,如果题目中只有一个条件:
If there were no Kleene stars (the * wildcard character for regular expressions), the problem would be easier - we simply check from left to right if each character of the text matches the pattern.
class Solution {
public static boolean isMath_noKleeneStars(String s, String p) {
if (p == null || p.isEmpty()) return s == null || s.isEmpty();
boolean isMath = s.charAt(0) == p.charAt(0) || p.charAt(0) == '.';
if (p.length() > 1) {
// 迭代的匹配,s和p的匹配结果,取决于s和p的子串的匹配结果
return isMath && isMatch_resursive(s.substring(1), p.substring(1));
}
return isMath;
}
}
那么我们再去考虑,有*的情况. 然后我们就能够知道,我们需要修改上面递归的那一块的代码了. 递归的时候,考虑*的情况,怎么考虑尼?
这个时候,考虑具体的迭代逻辑,不考虑实际的题目意思,加上上面的迭代的关键点,考虑上面的代码,遇到*号的时候,这个时候需要考虑前一个元素的匹配结果,所以我们需要判定p.length() > 1 的情况
class Solution {
public boolean isMatch(String text, String pattern) {
if (pattern.isEmpty()) return text.isEmpty();
// 第一个字符匹配的情况,这里面并没有把*号特殊的标出来,因为:It is guaranteed for each appearance of the character '*', there will be a previous valid character to match.
boolean first_match = (!text.isEmpty() &&
(pattern.charAt(0) == text.charAt(0) || pattern.charAt(0) == '.'));
if (pattern.length() >= 2 && pattern.charAt(1) == '*'){
// isMatch(text, pattern.substring(2)) 代表*连带的前面的字符也给吃掉了
// (first_match && isMatch(text.substring(1), pattern)) 代表*匹配了多个字符
return (isMatch(text, pattern.substring(2)) ||
(first_match && isMatch(text.substring(1), pattern)));
} else {
return first_match && isMatch(text.substring(1), pattern.substring(1));
}
}
}
上面是使用了递归的思想,虽然在递归迭代的时候,*的匹配规则:
‘*’ Matches zero or more of the preceding element.
转化为代码的时候:
(isMatch(text, pattern.substring(2))
(first_match && isMatch(text.substring(1), pattern)))
有点难理解,但是还是解决了问题。
第二种方法
从前一种方法中,我们了解到,这个问题存在:optimal substructure
As the problem has an optimal substructure, it is natural to cache intermediate results.
dp[i][j] 代表: : S字符串[0..i] 和 P字符串[0….j] 是否匹配
We can describe our answer in terms of answers to questions involving smaller strings.
具体的状态转义方程的推导方式:
// ######a(i)
// ####a(j)
// 1. if p.charAt(j) == s.charAt(i), M[i][j] = M[i - 1][j - 1]
// #######a(i)
// ####.(j)
// 2. if p.charAt(j) == '.', M[i][j] = M[i - 1][j - 1]
// 3. if p.charAt(j) == '*':
// #####a(i)
// ####b*(j)
// 1. if p.charAt(j - 1) != '.' && p.charAt(j - 1) != s.charAt(i), then b* is counted as empty. M[i][j] = M[i][j - 2]
// 2.if p.charAt(j - 1) == '.' || p.charAt(j - 1) == s.charAt(i):
// ######a(i)
// ####.*(j)
//
// #####a(i)
// ###a*(j)
// 2.1 if p.charAt(j - 1) is counted as empty, then M[i][j] = M[i][j - 2]
// 2.2 if counted as one, then M[i][j] = M[i - 1][j - 2]
// 2.3 if counted as multiple, then M[i][j] = M[i - 1][j]
// Observation: from above, we can see to get M[i][j], we need to know previous elements in M, i.e. we need to compute them first.
// which determines i goes from 1 to m - 1, j goes from 1 to n + 1
// 我们在推导M[i][j], 依赖的都是M[i-1][j-1]之类的前面的元素,所以我们首先要从小的元素开始计算,i的循环也就是从1到m+1,j的循环就是从1到n+1了
代码:
class Solution {
public boolean isMatch(String s, String p) {
/**
* dp is a N+1 x M+1 matrix; N is the length of s and M is the length of p
* dp[i][j] represents if first i characters of s match first j characters of p
* */
boolean[][] dp = new boolean[s.length()+1][p.length()+1];
/**
* base case:
* empty string s matches empty string p -> dp[0][0] = true
* non-empty string s never match empty string p -> dp[i][0] = false
*/
dp[0][0] = true;
for (int i = 1; i < dp.length; i++) {
dp[i][0] = false;
}
// j is index of p ,not the dp[0] index
for (int j =1 ; j < p.length();j++){
if(p.charAt(j) == '*' && dp[0][j-1]){
dp[0][j+1]=true;
}
}
/**
* dynamic programming process
* skip the first column because already filled as base case
*/
for (int i = 1; i < dp.length; i++) {
for (int j = 1; j < dp[i].length; j++) {
if (p.charAt(j-1) == '.') {
// If p's latest character is '.', dp[i][j] is equivalent to dp[i-1][j-1] because the newest characters are guaranteed to match
dp[i][j] = dp[i-1][j-1];
} else if (p.charAt(j-1) == '*') {
if (dp[i-1][j]) {
/**
* if p's newest chracter is '*' and dp[i-1][j] is true (this substring in p has been matching)
* dp[i][j] is true when the newst character in s matches the one representeed by '*': s.charAt(i-1) == p.charAt(j-2)
* dp[i][j] is true when the '*' represents '.': p.charAt(j-2) == '.'
* dp[i][j] is true when dp[i][j-2] is true because 0 occurrence of the character represented by '*'
**/
dp[i][j] = s.charAt(i-1) == p.charAt(j-2) || p.charAt(j-2) == '.' || dp[i][j-2];
} else {
/**
* if p's newest chracter is '*' and dp[i-1][j] is false (this substring in p hasn't matched)
* check if this new one gives a match: if dp[i][j-2] is true, dp[i][j] is true because of 0 occurrence of the '*' character
**/
dp[i][j] = dp[i][j-2];
}
} else {
/**
* if jth character in p is just a normal letter, check if this matches with the ith character in s
* If they match, dp[i][j] is true only if dp[i-1][j-1] is true (same idea as palindrome)
*/
dp[i][j] = dp[i-1][j-1] && s.charAt(i-1) == p.charAt(j-1);
}
}
}
return dp[s.length()][p.length()];
}
}
/**
* Time: O(MN)
* Space: O(MN)
* N is the length of string s, M is the length of string p
*
* Intuition: (DP)
* f(i,j) = if the length i prefix of s matches the length j prefix of p
*
* Base:
* f(0,0) = true; empty s matches empty p
* f(i,0) = false, where i > 0; non-empty s never matches empty p
* f(0,1) = false; empty s never matches length 1 p
*
* Result: f(length of s, length of p)
*
* Tansition:
* f(i,j) = f(i-1,j-1) && ith character in s == jth character in p; exact character match
* OR
* f(i,j) = f(i-1,j-1) && jth character in p is '.'; '.' is a universial match
* OR
* if jth character in p is '*':
* f(i,j) = f(i-1,j) && (ith character in s == the character represented by * || * represents . || f(i,j-2))
* {if p already matches with s before the newest character in s,
* then the newest character in s just needs to equal to the character represented by *
* OR s matches p with 0 occurrence of the character represented by *, which is f(i,j-2)}
* f(i,j) = !f(i-1,j) && f(i,j-2)
* {if p didn't match with s before the newest character in s,
* then check if the newest character in s makes it a match: f(i,j-2) -> 0 occurrence of the character represented by *}
* f(i,j) = true if i == 0 && (j == 2 || f(i,j-2))
* {if s is empty, true, if the last character is '*' on the 2nd index or if f(i,j-2) because of 0 occurrence of character represented by *}
*
*
* s = 'aaa'
* p = 'ab*a*c*a'
*
* Base Case:
* - a b * a * c * a
* - T F
* a F
* a F
* a F
*
* Keep Going (non-empty p matches empty s olny when using '*' appropriately):
* - a b * a * c * a
* - T F F F F F F F F
* a F
* a F
* a F
*
* Keep Going (0 occurrence of character represented by * would give it true):
* - a b * a * c * a
* - T F F F F F F F F
* a F T F T F T F T F
* a F F F F T T F F T
* a F F F F F T F T T -> result
*
* */
从上面来看,这种理解的成本还是比较大,不如go的清晰
func isMatch(s string, p string) bool {
sLen, pLen := len(s), len(p)
var dp [][]bool
var t []bool
for i := 0; i <= sLen; i++ {
t = make([]bool, pLen+1)
dp = append(dp, t)
}
// dp[i][j] holds a flag that whether s[0:i] mathes to p[0:j].
// (Note that we take the 0th index as "empty" string case)
for i := 0; i <= sLen; i++ {
for j := 0; j <= pLen; j++ {
if i == 0 && j == 0 {
// in case both s and p are empty
dp[i][j] = true
continue
} else if i == 0 {
// in case only s is empty.
// To match an empty string, p should be like a*, a*b*, a*.*a*, ....
// (* should appear in all odd indexes of p)
dp[i][j] = p[j-1] == '*' && dp[i][j-2]
continue
} else if j == 0 {
// in case only p is mepty.
// No string s cannot match to empty regular expression
// unless s itself is empty too (which is already handled)
dp[i][j] = false
continue
}
// in case both s and p are non-empty
// consider what should be sufficed to be able to say s[i-1] matches to p[j-1]
// (= dp[i][j] is true).
if p[j-1] != '*' {
// if p[j-1] is not '*', then
// 1) p[j-1] and s[j-1] should match (same character or p[j-1] == '.')
// 2) p[0:j-1] should match to s[0:i-1]
dp[i][j] = p[j-1] == '.' && dp[i-1][j-1]
} else {
// if p[j-1] is '*', we need to consider two scenarios
// case1) consume the '*'
// to consume * and have s[:i] match to p[:j],
// p[j-2] should equal to s[i-1] (or p[j-2] can be '.' to be equal to any s[i-1])
// AND s[:i-1] should match to p[:j] as well.
// ↓ i-1
// s: [ ] [ ] [a] [a]
// p: [ ] [a] [*] [ ]
// ↑ j-1
//
// ↓ i-1
// s: [ ] [ ] [b] [a]
// p: [ ] [.] [*] [ ]
// ↑ j-1
// in other words, because '*' is used with its previous character,
// for s[i-1] to be included as "a*" or ".*" in p,
// s[i-2] should also be included in "a*" or ".*".
if p[j-2] == '.' || p[j-2] == s[i-1] {
dp[i][j] = dp[i-1][j]
}
// case2) skip the '*'
// ↓ i-1
// s: [ ] [ ] [b] [a]
// p: [a] [b] [*] [ ]
// ↑ j-3 ↑ j-1
// even if we can't fulfill the condition above, if s[:i-1] matches to p[j-3],
// we can say s[:i-1] also matches to p[j-1]
// because '*' can mean "zero" appearance of the previous character.
// in other words, we can just skip "b*", not using it.
if dp[i][j-2] == true {
dp[i][j] = true
}
}
}
}
return dp[sLen][pLen]
}
使用JAVA从新写一篇逻辑:
//动态规划,从go中剥离出来的算法,另类的理解
public boolean isMatch(String s,String p ){
boolean[][] dp = new boolean[s.length()+1][p.length()+1];
for (int i=0;i < dp.length;i++) {
for (int j = 0; j < dp[0].length; j++) {
if (i==0 && j==0) {
//in case both s and p are empty
dp[i][j]=true;
continue;
} else if(i==0){
//in this case only s is empty.
dp[i][j] = p.charAt(j-1)=="*" && dp[i][j-2];
continue;
}else if(j ==0){
dp[i][j] = false;
continue;
}
if(p.charAt(j-1) != '*'){
dp[i][j] = (s.charAt(i-1)==p.charAt(j-1) || p.charAt(j-1) == '.') && dp[i-1][j-1];
}else{
//case 1 consumer the *, in other words, '*' is used with its previous character,
// * as "a*" or ".*" in p
if (p.charAt(j-2) == '.' || p.charAt(j-2) == s.charAt(i-1)){
dp[i][j]=dp[i-1][j];
}
//case 2 skip the *, in other words, we can just skip "b*", not using it.
if(dp[i][j-2]){
dp[i][j]=true;
}
}
}
}
return dp[s.length()][p.length()];
}
从理解成本上,来说,最后这种方式,理解起来更加的顺滑