软件开发面试C++问题 2010-11-08 14:19:27 当年每次经过计算机系走廊,到处听到的是老美响亮的“PLUS PLUS”的声音--都在讨论呢。那时遇到问题时的“圣经”是ARM (C++ Annotated Reference Manual)。过了10多年,C++已经实现了标准化,标准文件就有上千页。C++水平往往是HACKER水平高低的标志。 有次,有人把一个C++面试的问题发给我。我一看,现在面试问题越来越刁难了嘛。题目是: 有10万个文本文件,输入一个词组,如何迅速找出含这些词的文件。把题发过来后,开始计时,必须60-90分钟之内完成,将答案寄回 。 我一看,这不是要构造一个简单的搜索引擎嘛。 时间紧迫,我立刻打开VI,开始迅速TYPE,并进行了测试。然后将代码与测试的截屏在40多分钟的时候EMAIL了回去。下面将题及我的解答贴出,各位遇到类似的问题可以作为参考。 /////////////////////////////////面试问题////////////////////////////////////////////// /************* * * Copyright (C) D. Yue, March 4, 2009 * */ #include iostream #include iterator #include map #include vector #include list #include set #include algorithm #include boost/tokenizer.hpp /******************************** Problem 1 Time cut-off: 60 - 90 min. 1.Data: a set of 100,000 ASCII files (strings). 2.Each file contains 1 or more words. 3.A query is entered from stdin (may contain multiple words). 4.Code in C++: find files in (1) that partially matche to query input. 5.Assume: the function will be invoked repeatedly. 6.Optimize for time. ********************************************************************************************/ /************************** Solution to Problem 1 Algorithm datastructure *) Assign a unique integer ID to each of the N ASCII strings, so one can retrieve them by their ID, call these IDs StringIDs; *) Store the strings in some container, which allows easy retrieval by IDs, a hashtable is approriate. *) Break each string into words (tokenization) *) In a hash table with the words as keys, store the StringIDs of the strings that contain a key; *) On user input search the keys in the hashtable, once a match is found, get the StringIDs, then print the corresponding string *) take care of the requirement that multiple words must all match ******************************/ using namespace std; using namespace boost; struct StringWithID { StringWithID() {} StringWithID(int i, const string s): id(i), value(s) {}; int id; string value; static int getNextID() { return NextID++; } // used when assigning a new ID to a new string private: static int NextID; }; int StringWithID::NextID =1; //initialize class Search{ typedef mapint, struct StringWithID IdToString; // given an ID find the string typedef mapstring, int StringToId; // given a string, find its ID typedef mapstring, listint WordToIds; // given a word, find the IDs of the strings that contains the word public: //Initialize the StringSet and WordsToIDs map with the ASCII strings bool initialize( const vectorstring strs) { using namespace std; using namespace boost; for(vectorstring::const_iterator p=strs.begin(); p != strs.end(); p++ ) { if(str2id.find(*p) != str2id.end()) continue; // string already there int new_id = StringWithID::getNextID(); // assign this ID to the new guy id2str = StringWithID(new_id, *p); // stored str2id = new_id; //reverse lookup stored tokenizer tok(*p); for(tokenizer::iterator i=tok.begin(); i!=tok.end();i++){ cout"Got token " *i endl; w2id .push_back(new_id); } } return true; } //pattern is a space separated list of words //we iterate through the w2id map to check if bool find_match(const string pattern) { //first we tokenize the pattern into words liststring pats; tokenizer tok(pattern); for(tokenizer::iterator i=tok.begin(); i!=tok.end();i++){ pats.push_back(*i); } size_t pat_cnt = pats.size(); mapstring, setint matched_sets; for(liststring::iterator pi = pats.begin(); pi != pats.end(); pi++) { for(WordToIds::iterator witer = w2id.begin(); witer != w2id.end(); witer++) {//iteratate through the words string word = witer-first; if(word.find(*pi) != string::npos) { // the word matched the pattern cout"Found: " *pi " in: " wordendl; copy((witer-second).begin(), (witer-second).end(), inserter(matched_sets , matched_sets .begin())); for(setint::iterator i = matched_sets .begin(); i != matched_sets .end(); i++) { int str_id = *i; string str = id2str .value; cout"Found sub-pattern "*pi " in: "strendl; } break; } } } //at this point we have sets of IDs for the individual sub input patterns, we must find the ones that in all of the sets (intersection) setint good_ids; bool first_run = true; for(mapstring, setint ::iterator i = matched_sets.begin(); i!= matched_sets.end(); i++) { if(first_run) { good_ids = i-second; first_run = false; continue; } setint old_good_ids = good_ids; setint int_set = i-second; good_ids.clear(); set_intersection(int_set.begin(), int_set.end(), old_good_ids.begin(), old_good_ids.end(), inserter(good_ids, good_ids.begin())); } //now print out the strings if(good_ids.empty() ) { cout"No match found!"endl; }else { for(setint::iterator i = good_ids.begin(); i != good_ids.end(); i++) { int str_id = *i; string str = id2str .value; cout"Matched string: "strendl; } } return !good_ids.empty(); } protected: IdToString id2str; StringToId str2id; WordToIds w2id; }; //test program int main() { vectorstring strings; strings.push_back("angry brad pitt"); strings.push_back("pitt likes jolie"); Search search; search.initialize(strings); char buf ; while(std::cin.getline(buf, 255)) { search.find_match(buf); } }