Topics
Given two strings s
and t
of lengths m
and n
respectively, return the minimum window substring of s
such that every character in t
(including duplicates) is included in the window. If there is no such substring, return the empty string ""
.
Idea
We can apply the sliding window with variable window size (snake framework) approach to solve this. As duplicates are also allowed, we need to use frequency array as our state. The letters can only be a-z
or A-Z
, so max size of arr needs to be 26+26=52
(in code, I just used 64). The sliding window algo is simple: for each tail
, try to consume/advance head
only if consuming it doesn’t fulfil the property P.
Property P: window [tail, head]
contains all characters of t
(with required frequencies)
Thus, if our window stops at head
, then the seq [tail, head+1]
is the smallest one that starts at tail
and has all chars of t
. There can be an edge case, where we reach end of s
while consuming head
. To simplify implementation, I added a dummy char #
at the end of s
which indicates that if head
ever reaches till #
, then that window isn’t valid.
Tip
If
head
reaches sentinel char#
, we can early exit! Advancingtail
will be of no use.s = "ADOBECODEBANC#" t = "ABC"
When
tail
reaches the last B,head
expands until we reachN
, because if we consume the next charC
, we would have all chars fromt
. So our valid window:[tail, head + 1]
, i.e.BANC
of len 4. Next, we movetail
forward, so it’s at A now, we can now consume C and#
, and whoa, reached the end ofs
. At this stage, we know that this window has to invalid:ANC#
so movingtail
forward is useless:NC#
,C#
are all invalid.
Time Complexity:
Space Complexity:
Code
bool check(char c, vector<int> &freq, vector<int> &target_freq) {
if (c == '#')
return true;
bool ok = false;
freq[c - 'A']++;
// atleast one char in t is not yet in s
for (int i = 0; i < 64; ++i) {
if (target_freq[i] == 0)
continue;
if (target_freq[i] > freq[i]) {
ok = true;
break;
}
}
freq[c - 'A']--;
return ok;
}
void solve(string s, string t) {
int n = s.length();
int tail = 0, head = -1;
vector<int> freq(64, 0);
vector<int> target_freq(64, 0);
for (char &x : t) {
target_freq[x - 'A']++;
}
int ans_start = -1;
int ans_len = INT_MAX;
while (tail < n) {
while (head < n - 1 and (check(s[head + 1], freq, target_freq))) {
// if we can "eat" next element
head++;
if (s[head] != '#')
freq[s[head] - 'A']++;
}
// process *valid* windows only
if (head != n - 1) {
// update answer, valid window: [tail, head + 1]
int curr_len = (head + 1 - tail + 1);
if (curr_len < ans_len) {
ans_len = curr_len;
ans_start = tail;
}
} else break; // early exit
// undo effect of tail
if (tail <= head and s[tail] != '#') {
freq[s[tail] - 'A']--;
}
tail++;
head = max(head, tail - 1);
}
// print substring
if (ans_start == -1) {
cout << "" << endl;
return;
}
cout << s.substr(ans_start, ans_len) << endl;
}