Print Article

Improving the Performance of a High-Speed Network TCP/IP Traffic Analyzer

by Vukasin Pejovic, Gabriel Caffarena, Slobodan Bojanic Department of Electronic Engineering, Technical University of Madrid (U.P.M.), Spain

A common practice to speed up the performance of networking related technologies is to use hardware acceleration devices to eliminate the bottle-necks of the system and to boost the overall system performance. Within this framework, a modular hardware architecture aiming at network intrusion detection systems (NIDS) for high-speed data networks is presented, as well as some performance results of an early FPGA implementation. The NID system is able to process TCP/IP data achieving a throughput of 52 Gbps, thus being suitable for next generation data networks.

Network Intrusion Detection

Hardware acceleration has a significant role when focusing on NIDS. It is mostly used as a tool to increase the signature-based NIDS capability [1] by performing computationally demanding tasks within the process of deep packet inspection of the TCP/IP traffic. These signature-based or misuse-based NIDS are a compound set of rules, which prevent certain traffic descriptions to enter the protected network, as they are considered harmful. Obviously, the rules are written and maintained by a human expert, and this carries the normal risks human involvement normally induces. This fact is especially significant if one is aware of the complexity of the problem space with which the experts are dealing. Broadly, complexity is directly influenced by the wide spectrum of attacks spanning over all the communication layers. To that, it is necessary to add the natural application layer diversity, as the top communication layer, together with the constant mutation of already existing attacks. The attack space can be split and it is possible to have concurrent hardware modules processing the obtained sub-spaces, thus highly simplifying the problem space and introducing good partial solutions.

The use of modules that can understand the mechanisms characterising the subspaces, such as TCP/IP packet fragmentation/defragmentation, can easily detect and prevent all the existing attacks within a particular subspace. Furthermore, they are likely to be capable of defending against all future attacks that arise, as soon as new commercially available OSs become widespread (here referring to the MS 64bit OS) and are abusing the same mechanisms the modules are capable of supervising. The underlying need for parallelism and high speed endorses hardware platforms as the only reasonable implementation option.

Architecture

The modular architecture proposed is shown in Fig. 1, where concurrent processing and parallelism can be spotted easily. The input buffer is fed by an input source which is sampled by all modules in order to obtain the values from the positions that are relevant for the independent concurrent processing that will follow within each of them. The number of modules is limited by the physical constraints of the implementation, the available area and the desired performance. The modules process the input data and supply a single bit value as output. These outputs from the modules are then coded into a word of a desired width, presumably a 32-bit word. This information can then be forwarded to a higher instance, if needed, as the system output.

Fig. 1. Block diagram of the architecture

The modularity of the architecture (as illustrated) supports the scalability required by the functionality evolution process every design needs to support. This is one of the main virtues of the system. However, the scalability has to be studied further and in more detail from the perspective of the potential for the automatic expansion process of the architectural concept described here.

The destination platform we have used for the described architecture was the Xilinx Virtex4FX12, included in the destination board Xilinx ML403 board used to test the design. The design was implemented accordingly to Fig. 1. with three separate modules in the initial phase. They execute the tasks of IP Header checksum calculation and verification, monitoring of the IP fragmentation mechanism, and monitoring of the FTP header with its corresponding flag set. All modules were designed for IPv4. The first module sets its output if the checksum is not correct, the second module sets its output if the fragmentation is not correct, while the third one sets its output if all the TCP flags are set or none are, which is not the case in regular traffic. These three modules are selected as a minimalist set to provide enough functionality, relevant for the execution of the proof-of-concept experiments, based on the usage of the platform. The first module is a typical operation performed by TCP/IP network devices, and it provides the required performance for high-speed networks. The second module eliminates the Teardrop attack, Unnamed Attack, Ping of Death and similar attacks. And finally the third module prevents OS detection by malicious users.

Results

The obtained performance results are shown in Fig. 2. The values presented in the figure are obtained after the usage of two RTL synthesis tools one of which is Synplicity’s Synplify Pro® 8.6.2. The three implemented modules process the input data from the IP header only, so the throughput is 192 bits multiplied by the worst clock rate of 276 MHz (using the Synplify Pro tool), giving a throughput of 52.99 Gbps. During the implementation we have adapted the architecture to the destination platform and used prefabricated logic modules such as DSP48s. We have also developed the test environment which allowed us to test the architecture, implemented on the Xilinx ML403 board, as an entity in the network of our department.

Fig. 2. Performance limits for the 3 implemented modules

The Synplify Pro tool allowed us to obtain up to 40% better throughput results when compared with the other synthesis tool. This is especially significant, knowing that the next generation of high speed network, embodied in 10 Gigabit Ethernet, is just around the corner, and every throughput advance is of importance. Other significant benefits that were experienced while interacting with the Synplify Pro tool were the extremely useful RTL and Technology views, which allowed excellent design visualisation and tracking. The constant use of both permitted us to adapt the coding style to achieve the exact synthesis results we desired. The critical path analysis utility of the tool helped significantly in the performance optimisation process. It allowed targeting the critical paths, thus rapidly boosting the overall performance of the modules.

References

[1] V. Pejovic´, S. Bojanic´ and C. Carreras, ”Structural Framework for High Speed Intrusion Detection/Prevention Signature Based System”, International Journal of Computer Science and Network Security, vol 6. no. 9b, Sept. 2006, pp. 175-181.