# I. INTRODUCTION tring matching has been one of the most extensively studied problems in computer engineering since it performs important tasks in many applications like information retrieval (IRS), web search engines, error correction and several other fields [1][2][3][4][5][6][7][8][9][10][11][12]. Especially with the introduction of search engines dealing with tremendous amount of textual information presented on the World Wide Web, so this problem deserves special attention and any improvements to speed up the process will benefit these important applications [1][2][3][4][5][6][7][8][9][10][11][12]. As current free textual databases are growing almost exponentially with the time, the string matching problem becomes impractical to use the fastest sequential algorithms on a conventional sequential computer system [1][2][3][4][5][6][7][8][9][10][11][12]. To improve the performance of searching on large text collections, some researchers developed special purpose algorithms called parallel algorithms that parallelized the entire database comparison on general purpose parallel computers where each processor performs a number of comparisons independently. In Parallel processing the text string T and pattern P are assumed and that two input words have already been allocated in the processors in such a way that each processor stores a single text symbol, and some processors additionally a single pattern symbol. The input words are stored symbol-by-symbol in consecutive processors numbered according to the snake -like row -major indexing, that is, the processors in the odd-numbered rows 1, 3, 5, ... are numbered from left to right, and in the evennumbered rows from right to left. (The first symbols of T and P are in processor 1, the next in processor 2, and so on.) This allocation scheme places symbols adjacent in the text or pattern in adjacent processors. The output of the string matching algorithm is that each processor is to be marked as either being a starting position of an occurrence of P in T or not. In this paper we proposed a parallel string matching technique based on butterfly model. The main contributions of this work are summarized as follows. This work offers a comprehensive study as well as the results of typical parallel string matching algorithms at various aspects and their application on butterfly computing models. This work suggests the most efficient algorithmic butterfly models and demonstrates the performance gain for both synthetic and real data. The rest of this work is organized as, review typical algorithms, algorithmic models and finally conclude the study. # II. # RECENT ADVANCEMENTS AND GLOBAL RESEARCH The first optimal parallel string matching algorithm was proposed by Galil [13]. On SIMD-CRCW model, this algorithm required n / log n processors, and the time complexity is O(log n) ; on SIMD-CREW model, it required n / log2 n processors and the time complexity is O(log2 n). Vishkin [14] improved this algorithm to ensure it is still optimal when the alphabet size is not fixed. In [15], an algorithm used O (n × m) processors was presented, and the computation time is O(log log n) . A parallel KMP string matching algorithm on distributed memory machine was proposed by CHEN [16]. The algorithm is efficient and scalable in the distributed memory environment. Its computation complexity is O(n / p + m) , and p is the number of the processors. SV Raju et.al [17] presents new method for exact string matching algorithm based on layered architecture and two-dimensional array. This has applications such as string databases and computational biology. The main use of this method is to reduce the time spent on comparisons of string matching by distributing the data among processors which achieves a linear speedup and requires layered architecture and additionally p*# processors. A Bi Kun et.al [18] proposed the improved distributed string matching algorithm. And also an improved single string matching algorithm based on a variant Boyer-Moore algorithm is presented. In this they implement algorithm on the above architecture and the experiments prove that it is really practical and efficient on distributed memory machine. Its computation complexity is O(n/p + m), where n is the length of the text, and m is the length of the pattern, and p is the number of the processors. They show that this distributed architecture is suitable for paralleling the multipattern string matching algorithms and approximate string matching algorithms. Hsi-Chieh Le [19] et.al presents three algorithms for string matching on reconfigurable mesh architectures. Given a text T of length n and a pattern P of length m, the first algorithm finds the exact matching between T and P in O(1) time on a 2-dimensional RMESH of size (n -m+ 1) x m. The second algorithm finds the approximate matching between T and P in O(k) time on a 2D RMESH, where k is the maximum edit distance between T and P. The third algorithm allows only the replacement operation in the calculation of the edit distance and finds an approximate matching between T and P in constant-time on a 3D RMESH. By this paper we state that this is simpler model would be sufficient to run the proposed algorithms without increasing the reported time complexities. S V Raju [20] et.al considers the problem of string matching algorithm based on a two-dimensional mesh. This has applications such as string databases, cellular automata and computational biology. The main use of this method is to reduce the time spent on comparisons in string matching by using mesh connected network which achieves a constant time for mismatch a text string and. This is the first known optimal-time algorithm for pattern matching on meshes. The proposed strategy uses the knowledge from the given algorithm and mesh structure. Its'hak Dinstein [21] et.al propose a parallel computation approach to two dimensional shape recognition. This approach uses parallel techniques for contour extraction, parallel computation of normalized contour-based feature strings independent of scale and orientation, and parallel string matching algorithms. The implementation on the EREW PRAM architecture is discussed, but it can be adapted to other parallel architectures. Jin Hwan Park [22] et.al presents efficient dataflow schemes for parallel string matching. In this they consider two sub problems known as the exact matching and the k-mismatches problems are covered. Three parallel algorithms based on multiple input (and output) streams are presented. Time complexities of these parallel algorithms are O((n/d)+a), 0 £ a £ m, where n and m represent lengths of reference and pattern strings (n >> m) and d represents the number of streams used (the degree of parallelism). They show, they can control the degree of parallelism by using variable number (d) of input (and output) streams. They show their approaches solve the exact matching and the k-mismatches problems with time complexities of O((n / d) + a), where a = log m for the hierarchical scheme, m for the linear scheme, and 0 for the broadcasting scheme. Required time to process length n reference string is reduced by a factor of d by using d identical computation parts in parallel. With linear systolic array architecture, m PEs are needed for serial design and d*m PEs are needed for parallel design, where m is the pattern size and the d is the controllable degree of the parallelism (i.e. number of streams used). S V Raju [23] et.al considers the problem of exact string matching algorithm based on a twodimensional array. This has applications such as string databases, cellular automata and computational biology. The main use of this method is to reduce the time spent on comparisons in string matching by finding common characters in pattern string which achieves a constant time O(1) for pattern string in a text string. This reduces many calls across backend interface. Chuanpeng Chen [24] et.al propose a high throughput configurable string matching architecture based on Aho-Corasick algorithm. The architecture can be realized by random-access memory (RAM) and basic logic elements instead of designing new dedicated chips. The bit-split technique is used to reduce the RAM size, and the byte-parallel technique is used to boost the throughput of the architecture. By the particular design and comprehensive experiments with 100MHz RAM chips, one piece of the architecture can achieve a throughput of up to 1.6Gbps by 2-byte-parallel input, and we can further boost the throughput by using multiple parallel architectures. Prasanna [25] et.al propose a multi-core architecture on FPGA to address these challenges. They adopt the popular Aho-Corasick (AC-opt) algorithm for our string matching engine. Utilizing the data access feature in this algorithm, they design a specialized BRAM buffer for the cores to exploit a data reuse existing in such applications. Several design optimizations techniques are utilized to realize a simple design with high clock rate for the string matching engine. An implementation of a 2-core system with one shared BRAM buffer on a Virtex-5 LX155 achieves up to 3.2 GBPS throughput on a 64 MB state transition table stored in DRAM. Performance of systems with more cores is also evaluated for this architecture, and a throughput of over 5.5 Gbps can be obtained for some application scenarios. S. Muthukrishnan et.al [26] present an algorithm on the CRCW PRAM that checks if there exists a false match in O(1) time using O(n) processors. This algorithm does not require preprocessing the pattern. Therefore, checking for false matches is provably this simple algorithm to convert the Karp-Rabin Monte Carlo type string-matching algorithm into a Las Vegas type algorithm without asymptotic loss in complexity. Finally they present an efficient algorithm for identifying all the false matches and, as a consequence, show that string-matching algorithms take A.log log m/ time even given the flexibility to output a few false matches. S V Raju [27] et.al present new approach for parallel string matching. Some known parallel string matching algorithms are considered based on duels by witness who focuses on the strengths and weaknesses of the currently known methods. The new 'divide and conquer' approach has been introduced for parallel string matching, called the W-period, which is used for parallel preprocessing of the pattern and has optimal implementations in a various models of computation. The idea, common for every parallel string matching algorithm is slightly different from sequential ones as Knuth-Morris-Pratt or Boyer-Moore algorithm. # III. # COMMUNICATION NETWORK RELATION TO COMPUTER SYSTEM COMPONENTS A typical distributed system is shown in Figure 1. Each computer has a memory-processing unit and the computers are connected by a communication network. Figure 2 shows the relationships of the software components that run on each of the computers and use the local operating system and network protocol stack for functioning. The distributed software is also termed as middleware. A distributed execution is the execution of processes across the distributed system to collaboratively achieve a common goal. An execution is also sometimes termed a computation or a run. The distributed system uses a layered architecture to break down the complexity of system design. The middleware is the distributed software that drives the distributed system, while providing transparency of heterogeneity at the platform level [28]. Figure 2 schematically shows the interaction of this software with these system components at each processor. (WAN/LAN) IV. # TEXT PARTITIONING The exact string-matching problem can achieve data parallelism with data partitioning technique. We decompose the text into r subtexts, where each subtext contains (T/p)+m-1 successive characters of the complete text. There is an overlap of m-1 string characters between successive subtexts, i.e, a redundancy of r(m-1) characters. Alternatively it could be assumed that the database of an information retrieval system contains r independent documents. Therefore, in both the cases all the above partitions yield a number of independent tasks each comprising some data (i.e. a string and a large subtext) and a sequential string matching procedure that operates on that data. Further, each task completes its string matching operation on its local data and returns the number of occurrences [29][30][31]. Finally, we can observe that there are no communication requirements among the tasks but only global (or collective) communication is required. The main issue to be addressed is how the several tasks (or r subtexts) can be mapped or distributed to multiple processors for concurrent execution. In [29][30][31] different ways of distributing the database across a multi computer network were discussed. Let p be the number of processors in network and r be the number of subtext in the whole collection then the text partition is defined as, if r=p then each subtext contains T/p+m-1 characters. This is called static allocation of subtext as shown in Fig 3 . In the next section we present the parallel algorithm that is based on static allocation of subtext using MPI library. A significant contribution of this paper is a demonstration of the maximum size buffer with 2k processors for implementation of string matching and capable of accepting a character from r subtexts where k=8bits. This architecture enables a buffered string matching system implementing a KMP like pre computation algorithm. In the above mapping {a1, a2?ar} is the input string where r represents subtext and {p1, p2?pk} are the number of processors for the given input string (k=8). In the above mapping the given input string will be allocated to each processor as shown in Fig. 3. V. # METHODOLOGY Multistage interconnection networks (MINs) consist of more than one stages of small interconnection elements called switching elements and links interconnecting them. Multistage interconnection networks (MINs) are used in multiprocessing systems to provide cost-effective, high-bandwidth communication between processors and/or memory modules. A MIN normally connects N inputs to N outputs and is referred as an N × N MIN. The parameter N is called the size of the network [32][33]. The popularity of MINs stems from both the operational features they deliver -e.g. their ability to route multiple communication tasks concurrently-and the appealing cost/performance ratio they achieve. MINs with the Banyan [34] property e.g. Omega Networks [35], Delta Networks [36], and Generalized Cube Networks [37] are more widely adopted, since non-Banyan MINs have -generallyhigher cost and complexity. Both in the context of parallel and distributed system, the performance of the communication network interconnecting the system elements (nodes, processors, memory modules etc) is recognized as a critical factor for overall system performance. Consequently, the need for communication infrastructure performance prediction and evaluation has arisen, and numerous research efforts have targeted this area, employing either analytical models (mainly based on Markov models and Petri-nets) or simulation techniques. There are several different multistage interconnection networks proposed and studied in the literature. Figure1 illustrates a structure of multistage interconnection network, which are representatives of a general class of networks. This Figure 4 shows the connection between p inputs and b outputs, and connection between these is via n number of stages. A multistage interconnection network is actually a compromise between crossbar and shared bus networks multistage interconnection networks are: ? Attempt to reduce cost ? Attempt to decrease the path length In a multistage interconnection network, as in a crossbar, switching elements are distinct from processors. Instead messages pass through a series of switch stages. The network can be constructed from unidirectional or bi-directional switches and links. In a unidirectional MIN, all messages must traverse the same number of wires, and so the cost of sending a message is independent of processor location. In effect, all processors are equidistant. In a bidirectional MIN, the number of wires traversed depends to some extent on processor location, although to a lesser extent than a mesh or hypercube. # VI. # OMEGA NETWORK Omega network connecting P processors to P memory banks as shown in Fig 5 In general, it consists of p = (q + 1)2 q processors, organized as q + 1 ranks of 2 q processors each (Figure 1). Optionally we shall identify the rightmost and the leftmost ranks, so there is no rank q, and the processors on ranks 0 and q -1 are connected directly. Let us denote the processor i on the rank r by P ir,r, 0 ? i < 2 q , 0 ? r ? q. Then processor P i,r+1 is connected to the two processors P i,r , and P ir,r and processor P ir,r+1 is connected to the two processors P i,r . and P ir,r . Recall that i r = I q-1 .. .. i r ..i 0 These four connections form a "butterfly" pattern, from which the name of the network is derived. The hypercube is actually the butterfly with the rows collapsed. The communication link in the hypercube between processors P i , and P ir is identified with the communication links in the butterfly between P i,r+l and P ir,r+1 , and between P ir,r+l and P i,r . Let r be the probability of a switch being operational. As Omega is a unique-path MIN, the failure of any switch will cause system failure, so from the reliability point of view, there are log2N SEs in series for each terminal path. Hence, the terminal reliability of an N×N Omega is R t (Omega) = (r)log 2 N As there is only a single path between a particular input Si , i =1, 2, 3, 4, and a output in an 8×8 Omega so the terminal reliability is R t (Omega) = (r) 3 # PROPOSED SYSTEM STRUCTURE In this we propose a system for parallel processing with omega model. Its shared-memory, expandable MIMD parallel computer. The computer got its name from the omega switch which it uses for interprocessor communication. # Year The switch supports a processor-to-processor bandwidth of 32 Mbits/second. Figure 7 illustrates 8input 8-output omega switch. # VIII. PROGRAMMING THE OMEGA PARALLEL PROCESSOR The omega Parallel Processor is programmed exclusively in high-level programming languages. Searching, IRS, Editing, compiling and linking, downloading, running and debugging of programs are done from a UNIX front-end. A window manager enables rapid switching between the front-end and the Butterfly system environments. Two distinct approaches to programming the omega have seen widespread use: message passing and shared memory. When using the message passing paradigm the programmer decomposes the application into a moderately sized collection of loosely coupled processes which from time to time exchange control signals or data. This approach is similar to programming a multiprocessor application for a uniprocessor. In the shared memory approach, a task is usually some small procedure to be applied to a subset of the shared memory. A task, therefore, can be represented simply as an index, or a range of indices, into the shared memory and an operation to be performed on that memory. This style is particularly effective for applications containing a few frequently repeated tasks. Memory and processor management are used to keep all memories and processors equally busy. # IX. PARALLEL STRING MATCHING ALGORITHM ON OMEGA MODEL Step-1 R G O K A R A O K A R A J K A R A J U A R A J U Step ![Frame work for Parallel String Matching-A Computational Approach with Omega Model K Butchi Raju ? , Chinta Someswara Rao ? & Dr. S. Viswanadha Raju ?](image-2.png "AA") 12![Figure 1 : A distributed systems connects processors by a communication network. Extent of distributed protocols](image-3.png "Figure 1 :AFigure 2 :") 3![Figure 3 : Framework for pool of Processors](image-4.png "Figure 3 :") 4![Figure 4 : A Multistage Interconnection Network(MIN)](image-5.png "Figure 4 :") 5![Figure 5 : Omega Network Terminology : Terminal reliability is defined as the probability of successful communication between an input output pair. In this section, terminal reliability of Omega, has evaluated. The Omega is a unique-path MIN that has N input switches and N output switches and n stages, where n = log2N. An 8×8 Omega has](image-6.png "Figure 5 :") 6![Figure 6 : Switching Reliability](image-7.png "Figure 6 :") 7![Figure 7 : Omega Network with 8 i/o](image-8.png "Figure 7 :") ![Frame work for Parallel String Matching-A Computational Approach with Omega Model Global Journal of Computer Science and Technology Volume XIII Issue II Version I](image-9.png "A") ![In this model data on processors have been organized such that they represent the m sets of length of n-m+1 of the text string with m* n-m+1 matrix plus, the first processor of each row segment holding the first element of each set also carries an element of pattern. The process is similar as per above for the remaining m 1 rows.First show how to find the occurrences of pattern P in text string T on omega model with m*(n m+1) in constant time O (1). Lemma-1 : Each step in the above algorithm runs in constant Thus we have the following theorem. Theorem : There is a constant time string matching algorithm on a omega model that finds the occurrences of pattern in text using m*(n-m+1) processors Example : Text string T(n)= GOKARAJU and Pattern string P(m)= RAJU L1 ={GOKAR}, L2={OKARA} L3={KARAJ}L4={ARAJU}](image-10.png "") Switching Reliability Terminal Reliability of Omega0.990.9702990.980.9411920.960.8847360.950.8573750.940.8305840.920.7786880.90.729Reliability graph1.2Terminal reliability0.2 0.4 0.6 0.8 100.990.980.960.950.940.920.9switching reliability © 2013 Global Journals Inc. (US) A © 2013 Global Journals Inc. (US) As per the given example, after step 4 in the matrix M m+1,j values useful for deciding the matching is exact string matching or approximate string matching with the k mismatches. Lemma-2 : So that the string matching is completely scalability and obtain the following theorem. Theorem 1.2 : The given two strings size of text n and size of pattern m. find the occurrences of pattern in text. There is completely scalable on Butterfly model. The algorithm runs in O(m*(n-m+1))/P time, where P is the number of processors and 1?p?m*(n-m+1). X. ## PERFORMANCE EVALUATION In order to evaluate the overall performance of a multi-priority (NxN) MIN consisting of (2x2) SEs, we use the following metrics. Let T be a relatively large time period divided into u discrete time intervals (?1, ?2,, ?u). Average throughput the average number of packets accepted by all destinations per network cycle. Formally, ?h avg (or bandwidth) is defined as where n(k) denotes the number of packets that reach their destinations during the kth time interval. Normalized throughput is the ratio of the average throughput ?havg to number of network outputs N. Formally, Th can be expressed by back ground and reflects how effectively network capacity is used. Relative normalized throughput RTh(i) of i-class priority traffic, where i=1..p is the normalized throughput Th(i) of i-class priority packets divided by the corresponding-class offered load ?(i) of such packets. Average packet delay Davg(i) of i-class priority traffic, where i=1..p is the average time a corresponding-class priority packet spends to pass through the network. Formally, Davg(i) is expressed by where n(u) denotes the total number of the corresponding-class priority packets accepted within u time intervals and td(k) represents the total delay for the kth such packet. We consider td(k) = tw(k) + ttr(k) where tw(k) denotes the total queuing delay for kth packet waiting at each stage for the availability of a corresponding-class empty buffer at the next stage queue of the network. The second term ttr(k) denotes the total transmission delay for kth such packet at each stage of the network, that is just n*nc, where n=log2N is the number of intermediate stages and nc is the network cycle. XI. ## CONCLUSIONS In this paper we concentrate on parallel algorithms for string matching on computing models, especially in omega model. In this paper simulate the parallel algorithms for the implementation of high speed string matching; this uses fine-grained parallelism and performs matching of a search string by splitting the string into a set of substrings and then matching all of the substrings simultaneously. We also see that this implementation can be optimized in terms of resource utilization. ## References Références Referencias * Published by Foundation of Computer Science 2013 New York, USA * Parallel String Matching with Multi Core Processors-A Comparative Study for Gene Sequences ChintaSomeswararao Global Journal of Computer Science and Technology 13 1 2013 * Average-Optimal String Matching KGrabowski S Journal of Discrete Algorithms 2009 * Approximate String Matching with Compressed Indexes Algorithm LuisRusso LNavarro GOliveira AMorales P 2009 * The Longest Common Extension Problem, Revisited and Applications to Approximate String Searching LIlie GNavarro LTinta Journal of Discrete Algorithms 2010 * Average-Optimal String Matching KFredriksson SGrabowski Journal of Discrete Algorithms 2009 * Optimal parallel algorithms for string matching ZGalil Proc. 16th Annu. ACM symposium on Theory of computing 16th Annu. ACM symposium on Theory of computing 1984 * Optimal parallel matching in strings UVishkin Information and control 67 1985 * A parallel string search algorithm YTakefuji TTanaka KCLee IEEE Trans. Systems, Man and Cybernetics 22 March-April 1992 * Design and analysis of string matching algorithm on distributed memory machine Chen Guo-Liang GULin-Jie Nai-Jie Journal of Software 11 2000 * Backend Engine for Parallel String Matching Using Boolean Matrix SViswanadha Raju AVinaya Babu MMrudula IEEE on PAR ELEC 2006 * Liu Xiao-hu, and Liu Gang A Practical Distributed String Matching Algorithm Architecture and Implementation World Academy of Science, Engineering and Technology BiKun GuNai-Jie TuKun 2005 * RMESH Algorithms For Parallel String Matching Hsi-Chieh FikretLeet Ercalt 1997 IEEE * Optimal Parallel algorithm for String Matching on Mesh Network Structure SViswanadha Raju AVinayababu 2006 * Using Parallel String Matching Algorithms for Contour Based 2-D Shape Recognition MIts'hak Dinstein Landau 1990 IEEE * Parallel String Matching Algorithms Based on Dataflow Jin HwanPark KMGeorge IEEE on System Sciences 1999 * Efficient Parallel Pattern Matching using Partition Method SViswanadha Raju S R Mantena AVinaya Babu GV S Raju 2006 * Multi-Core Architecture on FPGA for Large Dictionary String Matching ChuanpengChen ZhongpingQin ;Wang KViktor Prasanna IEEE on Field Programmable Custom Computing Machines 2009 A Bit-split Byteparallel String Matching Architecture * Detecting False Matches in String-Matching Algorithms SMuthukrishnan Algorithmica 1997 Springer-Verlag New York Inc * W-Period Technique for Parallel String Matching SViswanadha Raju AVinaya GV SBabu KRRaju Madhavi 2007 * AjayDKshemkalyani MukeshSinghal Distributed Computing: Principles, Algorithms, and Systems Cambridge * Performance in the design of Parallel Programming SViswanadha Raju AVinayababu Proc ObComAPC-2004 ObComAPC-2004 Allied Publications 2004 * Parallel Approach for K String Matching SViswanadha Raju AVinayababu SPYanaiah Gvsraju Proc NCIMDiL-2006 NCIMDiL-2006Kharagpur 2006 Indian Institute Of Technology * An analytical performance model for multistage interconnection networks with blocking JGarofalakis EStergiou Procs. of CNSR 2008. May 2008 * The Performance of the Cedar Multistage Switching Network JosepTorrellas ZhengZhang IEEE Transactions on Parallel and Distributed Systems 8 4 1997 * Design and Analysis of High-performance Multistage Interconnection Networks SKBhogavilli HAbu-Amara IEEE Transactions on Computers 46 1 January 1997 * Banyan Networks for Partitioning Multiprocessor Systems GFGoke GJLipovski Procs. of 1st Annual Symposium on Computer Architecture s. of 1st Annual Symposium on Computer Architecture 1973 * Access and alignment of data in an array processor DALawrie IEEE Transactions on Computers, C 24 12 11451155 Dec. 1975 * Processor-memory interconnections for mutliprocessors JHPatel Procs. of 6th Annual Symposium on Computer Architecture s. of 6th Annual Symposium on Computer ArchitectureNew York 1979 * The extra stage cube: A fault-tolerant interconnection network for supersystems GBAdams HJSiegel IEEE Trans. on Computers 31 4 May 1982