[POJ 1200] Crazy Search【字符串Hash】

  • 2018-01-07
  • 0
  • 0

Problem:

Time Limit: 1000MS Memory Limit: 65536K

Description

Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle.
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text.As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa"; "aab"; "aba"; "bab"; "bac". Therefore, the answer should be 5.

Input

The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions.

Output

The program should output just an integer corresponding to the number of different substrings of size N found in the given text.

Sample Input

3 4
daababac

Sample Output

5

Hint

Huge input,scanf is recommended.

Source

Southwestern Europe 2002

Solution:

本题是较为基础的字符串hash,只是题目中数据范围实在不明确,数组只能开大一点了。。

首先给 NC 个不同字符分配一个权值 id[],然后将长度为 N 的子串当作 N 位的 NC 进制数,转化成十进制数来判重即可。

一个有效的优化Rabin-Karp 算法,又称滚动哈希,主要思想是记录 N 位 NC 进制数最高位的单位权值 T = NCN-1,每次不用重新进行进制转换,而是将前一个 hash 结果减去 T * id[最高位字符] 来快速计算本次的前 N - 1 位的 hash 值。

Code: O(L), 其中L为字符串长度 [7884K, 32MS]

#include<cstdio>
#include<cstdlib>
#include<cstring>
#include<cmath>
#include<iostream>
#include<algorithm>
using namespace std;

int N, NC, ans;
char str[16000005];
int id[128], topid = 0;
bool exi[16000005];
// Ambiguous data size is given so that the size of exi[] is not determined

inline int fastpow(int bas, int ex){
	int res = 1;
	while(ex){
		if(ex & 1) res *= bas;
		ex >>= 1, bas *= bas;
	}
	return res;
}

int main(){
	scanf("%d%d%s", &N, &NC, str);
	int len = strlen(str), topHash = fastpow(NC, N - 1);
	// Get length of str and weight of top digit of a N-digited number
	for(register int i = 0; i < len; i++)
		if(!id[str[i]]) id[str[i]] = topid++;  // Distribution the NC identities
	int curHash = 0;
	for(register int i = 0; i < N; i++) curHash = curHash * NC + id[str[i]];
	exi[curHash] = 1, ans = 1;
	for(register int i = N; i < len; i++){
		curHash = (curHash - topHash * id[str[i - N]]) * NC + id[str[i]];
		if(!exi[curHash]) exi[curHash] = 1, ans++;
	}
	printf("%d\n", ans);
	return 0;
}

评论

还没有任何评论,你来说两句吧



常年不在线的QQ:
49750

不定期更新的GitHub:
https://github.com/Darkleafin


OPEN AT 2017.12.10

如遇到代码不能正常显示的情况,请刷新页面。
If the code cannot be displayed normally, please refresh the page.


发现一个优美的网站:
https://visualgo.net/en
















- Theme by Qzhai