L10-Regular Expression Matching


/**

 10. Regular Expression Matching

 Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.

 '.' Matches any single character.
 '*' Matches zero or more of the preceding element.
 The matching should cover the entire input string (not partial).

 Note:

 s could be empty and contains only lowercase letters a-z.
 p could be empty and contains only lowercase letters a-z, and characters like . or *.
 Example 1:

 Input:
 s = "aa"
 p = "a"
 Output: false
 Explanation: "a" does not match the entire string "aa".
 Example 2:

 Input:
 s = "aa"
 p = "a*"
 Output: true
 Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".
 Example 3:

 Input:
 s = "ab"
 p = ".*"
 Output: true
 Explanation: ".*" means "zero or more (*) of any character (.)".
 Example 4:
mmm
 Input:
 s = "aab"
 p = "c*a*b"
 Output: true
 Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore, it matches "aab".
 Example 5:

 Input:
 s = "mississippi"
 p = "mis*is*p*."
 Output: false

 * */

参数限制:

Constraints:

1 <= s.length <= 20
1 <= p.length <= 30
s contains only lowercase English letters.
p contains only lowercase English letters, '.', and '*'.
It is guaranteed for each appearance of the character '*', there will be a previous valid character to match.

第一种方法

先看官方的文档中的解法:首先就是递归处理,而且还介绍了一种方法,P中包含了两种字符,一个. 比较好处理,就是任意匹配就完活了,一个有点困难,特别是已经有了一个的情况下,所以从官方的solution中,我们学到了一个方法,我命名为:“假想降级”,如果题目中只有一个条件:

If there were no Kleene stars (the * wildcard character for regular expressions), the problem would be easier - we simply check from left to right if each character of the text matches the pattern.

class Solution {
    public static boolean isMath_noKleeneStars(String s, String p) {
        if (p == null || p.isEmpty()) return s == null || s.isEmpty();
        boolean isMath = s.charAt(0) == p.charAt(0) || p.charAt(0) == '.';
        if (p.length() > 1) {
            // 迭代的匹配,s和p的匹配结果,取决于s和p的子串的匹配结果
            return isMath && isMatch_resursive(s.substring(1), p.substring(1));
        }
        return isMath;
    }
}

那么我们再去考虑,有*的情况. 然后我们就能够知道,我们需要修改上面递归的那一块的代码了. 递归的时候,考虑*的情况,怎么考虑尼?

这个时候,考虑具体的迭代逻辑,不考虑实际的题目意思,加上上面的迭代的关键点,考虑上面的代码,遇到*号的时候,这个时候需要考虑前一个元素的匹配结果,所以我们需要判定p.length() > 1 的情况

class Solution {
    public boolean isMatch(String text, String pattern) {
        if (pattern.isEmpty()) return text.isEmpty();
        // 第一个字符匹配的情况,这里面并没有把*号特殊的标出来,因为:It is guaranteed for each appearance of the character '*', there will be a previous valid character to match.
        boolean first_match = (!text.isEmpty() &&
                               (pattern.charAt(0) == text.charAt(0) || pattern.charAt(0) == '.'));

        if (pattern.length() >= 2 && pattern.charAt(1) == '*'){
            // isMatch(text, pattern.substring(2)) 代表*连带的前面的字符也给吃掉了
            // (first_match && isMatch(text.substring(1), pattern)) 代表*匹配了多个字符
            return (isMatch(text, pattern.substring(2)) ||
                    (first_match && isMatch(text.substring(1), pattern)));
        } else {
            return first_match && isMatch(text.substring(1), pattern.substring(1));
        }
    }
}

上面是使用了递归的思想,虽然在递归迭代的时候,*的匹配规则:

‘*’ Matches zero or more of the preceding element.

转化为代码的时候:

(isMatch(text, pattern.substring(2))
(first_match && isMatch(text.substring(1), pattern)))

有点难理解,但是还是解决了问题。

第二种方法

从前一种方法中,我们了解到,这个问题存在:optimal substructure

As the problem has an optimal substructure, it is natural to cache intermediate results.

dp[i][j] 代表: : S字符串[0..i] 和 P字符串[0….j] 是否匹配

We can describe our answer in terms of answers to questions involving smaller strings.

具体的状态转义方程的推导方式:

    //    ######a(i)
    //    ####a(j)
    // 1. if p.charAt(j) == s.charAt(i), M[i][j] = M[i - 1][j - 1]
    //    #######a(i)
    //    ####.(j)
    // 2. if p.charAt(j) == '.', M[i][j] = M[i - 1][j - 1]

    // 3. if p.charAt(j) == '*':
    //       #####a(i)
    //       ####b*(j)
    //    1. if p.charAt(j - 1) != '.' && p.charAt(j - 1) != s.charAt(i), then b* is counted as empty. M[i][j] = M[i][j - 2]

    //    2.if p.charAt(j - 1) == '.' || p.charAt(j - 1) == s.charAt(i):    
    //       ######a(i)
    //       ####.*(j)
    //
    //       #####a(i)
    //       ###a*(j)
    //      2.1 if p.charAt(j - 1) is counted as empty, then M[i][j] = M[i][j - 2]
    //      2.2 if counted as one, then M[i][j] = M[i - 1][j - 2]
    //      2.3 if counted as multiple, then M[i][j] = M[i - 1][j]
                

    // Observation: from above, we can see to get M[i][j], we need to know previous elements in M, i.e. we need to compute them first. 
    // which determines i goes from 1 to m - 1, j goes from 1 to n + 1
    //  我们在推导M[i][j], 依赖的都是M[i-1][j-1]之类的前面的元素,所以我们首先要从小的元素开始计算,i的循环也就是从1到m+1,j的循环就是从1到n+1了

代码:


class Solution {
    public boolean isMatch(String s, String p) {
        /**
         * dp is a N+1 x M+1 matrix; N is the length of s and M is the length of p
         * dp[i][j] represents if first i characters of s match first j characters of p
         * */
        boolean[][] dp = new boolean[s.length()+1][p.length()+1];
        /**
         * base case:
         * empty string s matches empty string p -> dp[0][0] = true
         * non-empty string s never match empty string p -> dp[i][0] = false
         */
        dp[0][0] = true;
        for (int i = 1; i < dp.length; i++) {
            dp[i][0] = false;
        }

        // j is index of p ,not the dp[0] index
        for (int j =1 ; j < p.length();j++){
            if(p.charAt(j) == '*' && dp[0][j-1]){
                dp[0][j+1]=true;
            }
        }

        /**
         * dynamic programming process
         * skip the first column because already filled as base case
         */
        for (int i = 1; i < dp.length; i++) {
            for (int j = 1; j < dp[i].length; j++) {
                if (p.charAt(j-1) == '.') {
                    // If p's latest character is '.', dp[i][j] is equivalent to dp[i-1][j-1] because the newest characters are guaranteed to match
                    dp[i][j] = dp[i-1][j-1];
                } else if (p.charAt(j-1) == '*') {
                    if (dp[i-1][j]) {
                        /**
                         * if p's newest chracter is '*' and dp[i-1][j] is true (this substring in p has been matching)
                         * dp[i][j] is true when the newst character in s matches the one representeed by '*': s.charAt(i-1) == p.charAt(j-2)
                         * dp[i][j] is true when the '*' represents '.': p.charAt(j-2) == '.'
                         * dp[i][j] is true when dp[i][j-2] is true because 0 occurrence of the character represented by '*'
                         **/
                        dp[i][j] = s.charAt(i-1) == p.charAt(j-2) || p.charAt(j-2) == '.' || dp[i][j-2];
                    } else {
                        /**
                         * if p's newest chracter is '*' and dp[i-1][j] is false (this substring in p hasn't matched)
                         * check if this new one gives a match: if dp[i][j-2] is true, dp[i][j] is true because of 0 occurrence of the '*' character
                         **/
                        dp[i][j] = dp[i][j-2];
                    }
                } else {
                    /**
                     * if jth character in p is just a normal letter, check if this matches with the ith character in s
                     * If they match, dp[i][j] is true only if dp[i-1][j-1] is true (same idea as palindrome)
                     */
                    dp[i][j] = dp[i-1][j-1] && s.charAt(i-1) == p.charAt(j-1);
                }
            }
        }

        return dp[s.length()][p.length()];
    }
}
        

/**
 * Time: O(MN)
 * Space: O(MN)
 * N is the length of string s, M is the length of string p
 *
 * Intuition: (DP)
 * f(i,j) = if the length i prefix of s matches the length j prefix of p
 *
 * Base:
 * f(0,0) = true; empty s matches empty p
 * f(i,0) = false, where i > 0; non-empty s never matches empty p
 * f(0,1) = false; empty s never matches length 1 p
 *
 * Result: f(length of s, length of p)
 *
 * Tansition:
 * f(i,j) = f(i-1,j-1) && ith character in s == jth character in p; exact character match
 * OR
 * f(i,j) = f(i-1,j-1) && jth character in p is '.'; '.' is a universial match
 * OR
 * if jth character in p is '*':
 *     f(i,j) = f(i-1,j) && (ith character in s == the character represented by * || * represents . || f(i,j-2))
 *         {if p already matches with s before the newest character in s,
 *         then the newest character in s just needs to equal to the character represented by *
 *         OR s matches p with 0 occurrence of the character represented by *, which is f(i,j-2)}
 *     f(i,j) = !f(i-1,j) && f(i,j-2)
 *          {if p didn't match with s before the newest character in s,
 *          then check if the newest character in s makes it a match: f(i,j-2) -> 0 occurrence of the character represented by *}
 *     f(i,j) = true if i == 0 && (j == 2 || f(i,j-2))
 *          {if s is empty, true, if the last character is '*' on the 2nd index or if f(i,j-2) because of 0 occurrence of character represented by *}
 *
 *
 * s = 'aaa'
 * p = 'ab*a*c*a'
 *
 * Base Case:
 *   - a b * a * c * a
 * - T F
 * a F
 * a F
 * a F
 *
 * Keep Going (non-empty p matches empty s olny when using '*' appropriately):
 *   - a b * a * c * a
 * - T F F F F F F F F
 * a F
 * a F
 * a F
 *
 * Keep Going (0 occurrence of character represented by * would give it true):
 *   - a b * a * c * a
 * - T F F F F F F F F
 * a F T F T F T F T F
 * a F F F F T T F F T
 * a F F F F F T F T T -> result
 *
 * */

从上面来看,这种理解的成本还是比较大,不如go的清晰

func isMatch(s string, p string) bool {
	sLen, pLen := len(s), len(p)
	var dp [][]bool
	var t []bool
	for i := 0; i <= sLen; i++ {
		t = make([]bool, pLen+1)
		dp = append(dp, t)
	}

	// dp[i][j] holds a flag that whether s[0:i] mathes to p[0:j].
	// (Note that we take the 0th index as "empty" string case)
	for i := 0; i <= sLen; i++ {
		for j := 0; j <= pLen; j++ {
			if i == 0 && j == 0 {
				// in case both s and p are empty
				dp[i][j] = true
				continue
			} else if i == 0 {
				// in case only s is empty.
				// To match an empty string, p should be like a*, a*b*, a*.*a*, ....
				// (* should appear in all odd indexes of p)
				dp[i][j] = p[j-1] == '*' && dp[i][j-2]
				continue
			} else if j == 0 {
				// in case only p is mepty.
				// No string s cannot match to empty regular expression
				// unless s itself is empty too (which is already handled)
				dp[i][j] = false
				continue
			}

			// in case both s and p are non-empty
			// consider what should be sufficed to be able to say s[i-1] matches to p[j-1]
			// (= dp[i][j] is true).
			if p[j-1] != '*' {
				// if p[j-1] is not '*', then
				// 1) p[j-1] and s[j-1] should match (same character or p[j-1] == '.')
				// 2) p[0:j-1] should match to s[0:i-1]
				dp[i][j] = p[j-1] == '.' && dp[i-1][j-1]
			} else {
				// if p[j-1] is '*', we need to consider two scenarios
				// case1) consume the '*'
				// to consume * and have s[:i] match to p[:j],
				// p[j-2] should equal to s[i-1] (or p[j-2] can be '.' to be equal to any s[i-1])
				// AND s[:i-1] should match to p[:j] as well.
				//                 ↓ i-1
				// s: [ ] [ ] [a] [a]
				// p: [ ] [a] [*] [ ]
				//             ↑ j-1
				//
				//                 ↓ i-1
				// s: [ ] [ ] [b] [a]
				// p: [ ] [.] [*] [ ]
				//             ↑ j-1
				// in other words, because '*' is used with its previous character,
				// for s[i-1] to be included as "a*" or ".*" in p,
				// s[i-2] should also be included in "a*" or ".*".
				if p[j-2] == '.' || p[j-2] == s[i-1] {
					dp[i][j] = dp[i-1][j]
				}
				// case2) skip the '*'
				//                 ↓ i-1
				// s: [ ] [ ] [b] [a]
				// p: [a] [b] [*] [ ]
				//     ↑ j-3   ↑ j-1
				// even if we can't fulfill the condition above, if s[:i-1] matches to p[j-3],
				// we can say s[:i-1] also matches to p[j-1]
				// because '*' can mean "zero" appearance of the previous character.
				// in other words, we can just skip "b*", not using it.
				if dp[i][j-2] == true {
					dp[i][j] = true
				}
			}
		}
	}
	return dp[sLen][pLen]
}

使用JAVA从新写一篇逻辑:

 //动态规划,从go中剥离出来的算法,另类的理解
    public boolean isMatch(String s,String p ){
        boolean[][] dp = new boolean[s.length()+1][p.length()+1];
        for (int i=0;i < dp.length;i++) {
            for (int j = 0; j < dp[0].length; j++) {
                if (i==0 && j==0) {
                    //in case both s and p are empty
                    dp[i][j]=true;
                    continue;
                } else if(i==0){
                    //in this case only s is empty.
                    dp[i][j] = p.charAt(j-1)=="*" && dp[i][j-2];
                    continue;
                }else if(j ==0){
                    dp[i][j] = false;
                    continue;
                }

                if(p.charAt(j-1) != '*'){
                    dp[i][j] = (s.charAt(i-1)==p.charAt(j-1) || p.charAt(j-1) == '.') && dp[i-1][j-1];
                }else{
                    //case 1 consumer the *, in other words, '*' is used with its previous character,
                    // * as "a*" or ".*" in p
                    if (p.charAt(j-2) == '.' || p.charAt(j-2) == s.charAt(i-1)){
                        dp[i][j]=dp[i-1][j];
                    }

                    //case 2 skip the *, in other words, we can just skip "b*", not using it.
                    if(dp[i][j-2]){
                        dp[i][j]=true;
                    }
                }
            }
        }
        return dp[s.length()][p.length()];
    }

从理解成本上,来说,最后这种方式,理解起来更加的顺滑

Powered by andiHappy and Theme by AndiHappy