给定一个段落 (paragraph) 和一个禁用单词列表 (banned)。返回出现次数最多,同时不在禁用列表中的单词。
题目保证至少有一个词不在禁用列表中,而且答案唯一。
禁用列表中的单词用小写字母表示,不含标点符号。段落中的单词不区分大小写。答案都是小写字母。
示例:
输入: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit." banned = ["hit"] 输出: "ball" 解释: "hit" 出现了3次,但它是一个禁用的单词。 "ball" 出现了2次 (同时没有其他单词出现2次),所以它是段落里出现次数最多的,且不在禁用列表中的单词。 注意,所有这些单词在段落里不区分大小写,标点符号需要忽略(即使是紧挨着单词也忽略, 比如 "ball,"), "hit"不是最终的答案,虽然它出现次数更多,但它在禁用单词列表中。
提示:
1 <= 段落长度 <= 1000
0 <= 禁用单词个数 <= 100
1 <= 禁用单词长度 <= 10
- 答案是唯一的, 且都是小写字母 (即使在
paragraph
里是大写的,即使是一些特定的名词,答案都是小写的。) paragraph
只包含字母、空格和下列标点符号!?',;.
- 不存在没有连字符或者带有连字符的单词。
- 单词里只包含字母,不会出现省略号或者其他标点符号。
方法一:正则匹配/双指针 + 哈希表
正则匹配(或双指针)找出所有单词,用哈希表统计每个单词出现的频率,找到出现未在 banned 中出现且频率最大的单词。
class Solution:
def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
s = set(banned)
p = Counter(re.findall('[a-z]+', paragraph.lower()))
return next(word for word, _ in p.most_common() if word not in s)
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Solution {
private static Pattern pattern = Pattern.compile("[a-z]+");
public String mostCommonWord(String paragraph, String[] banned) {
Set<String> bannedWords = new HashSet<>();
for (String word : banned) {
bannedWords.add(word);
}
Map<String, Integer> counter = new HashMap<>();
Matcher matcher = pattern.matcher(paragraph.toLowerCase());
while (matcher.find()) {
String word = matcher.group();
if (bannedWords.contains(word)) {
continue;
}
counter.put(word, counter.getOrDefault(word, 0) + 1);
}
int max = Integer.MIN_VALUE;
String ans = null;
for (Map.Entry<String, Integer> entry : counter.entrySet()) {
if (entry.getValue() > max) {
max = entry.getValue();
ans = entry.getKey();
}
}
return ans;
}
}
function mostCommonWord(paragraph: string, banned: string[]): string {
const s = paragraph.toLocaleLowerCase();
const map = new Map<string, number>();
const set = new Set<string>(banned);
for (const word of s.split(/[^A-z]/)) {
if (word === '' || set.has(word)) {
continue;
}
map.set(word, (map.get(word) ?? 0) + 1);
}
return [...map.entries()].reduce(
(r, v) => (v[1] > r[1] ? v : r),
['', 0],
)[0];
}
use std::collections::{HashMap, HashSet};
impl Solution {
pub fn most_common_word(mut paragraph: String, banned: Vec<String>) -> String {
paragraph.make_ascii_lowercase();
let banned: HashSet<&str> = banned.iter().map(String::as_str).collect();
let mut map = HashMap::new();
for word in paragraph.split(|c| !matches!(c, 'a'..='z')) {
if word.is_empty() || banned.contains(word) {
continue;
}
let val = map.get(&word).unwrap_or(&0) + 1;
map.insert(word, val);
}
map.into_iter()
.max_by_key(|&(_, v)| v)
.unwrap()
.0
.to_string()
}
}
class Solution {
public:
string mostCommonWord(string paragraph, vector<string>& banned) {
unordered_set<string> s(banned.begin(), banned.end());
unordered_map<string, int> counter;
string ans;
for (int i = 0, mx = 0, n = paragraph.size(); i < n;) {
if (!isalpha(paragraph[i]) && (++i > 0)) continue;
int j = i;
string word;
while (j < n && isalpha(paragraph[j])) {
word.push_back(tolower(paragraph[j]));
++j;
}
i = j + 1;
if (s.count(word)) continue;
++counter[word];
if (counter[word] > mx) {
ans = word;
mx = counter[word];
}
}
return ans;
}
};
func mostCommonWord(paragraph string, banned []string) string {
s := make(map[string]bool)
for _, w := range banned {
s[w] = true
}
counter := make(map[string]int)
var ans string
for i, mx, n := 0, 0, len(paragraph); i < n; {
if !unicode.IsLetter(rune(paragraph[i])) {
i++
continue
}
j := i
var word []byte
for j < n && unicode.IsLetter(rune(paragraph[j])) {
word = append(word, byte(unicode.ToLower(rune(paragraph[j]))))
j++
}
i = j + 1
t := string(word)
if s[t] {
continue
}
counter[t]++
if counter[t] > mx {
ans = t
mx = counter[t]
}
}
return ans
}