Each of us has been faced with the problem of searching for information more than once. Irregardless of the data source we are using (Internet, file system on our hard drive, data base or a global information system of a big company) the problems can be multiple and include the physical volume of the data base searched, the information being unstructured, different file types and also the complexity of accurately wording the search query. We have already reached the stage when the amount of data on one single PC is comparable to the amount of text data stored in a proper library. And as to the unstructured data flows, in future they are only going to increase, and at a very rapid tempo. If for an average user this might be just a minor misfortune, for a big company absence of control over information can mean significant problems. So the necessity to create search systems and technologies simplifying and accelerating access to the necessary information, originated long ago. Such systems are numerous and moreover not every one of them is based on a unique technology. And the task of choosing the right one depends directly on the specific tasks to be solved in the future. While the demand for the perfect data searching and processing tools is steadily growing let’s consider the state of affairs with the supply side.
Not going deeply into the various peculiarities of the technology, all the searching programs and systems can be divided into three groups. These are: global Internet systems, turnkey business solutions (corporate data searching and processing technologies) and simple phrasal or file search on a local computer. Different directions presumably mean different solutions.
Local search gps tracker
Everything is clear about search on a local PC. It’s not remarkable for any particular functionality features accept for the choice of file type (media, text etc.) and the search destination. Just enter the name of the searched file (or part of text, for example in the Word format) and that’s it. The speed and result depend fully on the text entered into the query line. There is zero intellectuality in this: simply looking through the available files to define their relevance. This is in its sense explicable: what’s the use of creating a sophisticated system for such uncomplicated needs.
Global search technologies
Matters stand totally different with the search systems operating in the global network. One can’t rely simply on looking through the available data. Huge volume (Yandex for instance can boast the indexing capacity of more than 11 terabyte of data) of the global chaos of unstructured information will make the simple search not only ineffective but also long and labor-consuming. That’s why lately the focus has shifted towards optimizing and improving quality characteristics of search. But the scheme is still very simple (except for the secret innovations of every separate system) – the phrasal search through the indexed data base with proper consideration for morphology and synonyms. Undoubtedly, such an approach works but doesn’t solve the problem completely. Reading dozens of various articles dedicated to improving search with the help of Google or Yandex, one can drive at the conclusion that without knowing the hidden opportunities of these systems finding a relevant document by the query is a matter of more than a minute, and sometimes more than an hour. The problem is that such a realization of search is very dependent on the query word or phrase, entered by the user. The more indistinct the query the worse is the search. This has become an axiom, or dogma, whichever you prefer.
Of course, intelligently using the key functions of the search systems and properly defining the phrase by which the documents and sites are searched, it is possible to get acceptable results. But this would be the result of painstaking mental work and time wasted on looking through irrelevant information with a hope to at least find some clues on how to upgrade the search query. In general, the scheme is the following: enter the phrase, look through several results, making sure that the query was not the right one, enter a new phrase and the stages are repeated till the relevancy of results achieves the highest possible level. But even in that case the chances to find the right document are still few. No average user will voluntary go for the sophistication of “advanced search” (although it is equipped with a number of very useful functions such as the choice of language, file format etc.). The best would be to simply insert the word or phrase and get a ready answer, without particular concern for the means of getting it. Let the horse think – it has a big head. Maybe this is not exactly up to the point, but one of the Google search functions is called “I am feeling lucky!” characterizes very well the existent searching technologies. Nevertheless, the technology works, not ideally and not always justifying the hopes, but if you allow for the complexity of searching through the chaos of Internet data volume, it could be acceptable.
The third on the list are the turnkey solutions based on the searching technologies. They are meant for serious companies and corporations, possessing really large data bases and staffed with all sorts of information systems and documents. In principle, the technologies themselves can also be used for home needs. For example, a programmer working remotely from the office will make good use of the search to access randomly located on his hard drive program source codes. But these are particulars. The main application of the technology is still solving the problem of quickly and accurately searching through large data volumes and working with various information sources. Such systems usually operate by a very simple scheme (although there are undoubtedly numerous unique methods of indexing and processing queries underneath the surface): phrasal search, with proper consideration for all the stem forms, synonyms etc. which once again leads us to the problem of human resource. When using such technology the user should first word the query phrases which are going to be the search criteria and presumably met in the necessary documents to be retrieved. But there is no guarantee that the user will be able to independently choose or remember the correct phrase and furthermore, that the search by this phrase will be satisfactory.
One more key moment is the speed of processing a query. Of course, when using the whole document instead of a couple of words, the accuracy of search increases manifold. But up to date, such an opportunity has not been used because of the high capacity drain of such a process. The point is that search by words or phrases will not provide us with a highly relevant similarity of results. And the search by phrase equal in its length the whole document consumes much time and computer resources. Here is an example: while processing the query by one word there is no considerable difference in speed: whether it’s 0,1 or 0,001 second is not of crucial importance to the user. But when you take an average size document which contains about 2000 unique words, then the search with consideration for morphology (stem forms) and thesaurus (synonyms), as well as generating a relevant list of results in case of search by key words will take several dozens of minutes (which is unacceptable for a user