1. The Problem
Insider threat detection specifically to detect data-exfiltration over the network is a challenge. Existing perimeter security solutions and end-system security are not effectively devised to handle insider attack and data exfiltration. Issues such as misconfiguration, vulnerable points, and covert network channels lead to data-exfiltration. Hence, building an effective system for detecting insider attacks remains an open challenge. Devising an exclusive behavior-based model is essential to detect data-exfiltration over the network by utilizing parameters from both system and network.
2. Motivation
Nowadays thousands of companies and organizations have been subject to a cyber-attack. Many hackers that exist outside of a company or an organization hacking and breaking into information systems to execute these cyber-attacks. But nowadays insiders that are hackers exist inside of a company or an organization are becoming a big problem and insiders are often overloaded as potential threats to commit cyber-attack.
Usually we apply some kind of security policies and access control policies which are aimed to prevent our data, business policy, product information, secret information, etc. but still sometimes it fails, because when we trust someone and give him/her full access to network, host, etc., they usually try to leak our data or information. This is the reason that we say, insider attacks are often more damaging and costly due to the knowledge of and access to information system. From the recent survey done by many institutions and researchers in 2010 and 2011, we know that more than a quarter of cyber-attacks were done by insiders.
3. Insider Threat
An insider threat is a malicious hacker (also called a cracker or a black hat) who is an employee or officer of a business, institution, or agency. Insider threats are often disgruntled employees or ex-employees who believe that the business, institution, or agency has “done them wrong” and feel justified in gaining revenge.
It is difficult to detect and prevent, attacks by people with legitimate access to an organization’s computers and networks represent a growing problem in our digital world. These insider threats frustrate employers who lack the resources to identify them and monitor their behavior. Insiders are not just employees: today they can include contractors, business partners, auditors… even an alumnus with a valid email address.
Define an insider as a trusted entity that is given the power to violate a security policy. The insider is determined in reference to an established security policy. Insider threat lies in the access and ability to: violate a security policy using legitimate access and violate an access control policy by obtaining unauthorized access.
4. Insider Type
There are different types of insiders that result in information leakage: Inadvertent, Intentional and Malicious. An inadvertent insider is a trusted person with access to sensitive information who inadvertently discloses sensitive information. An intentional insider is a trusted person who knowingly discloses sensitive information and is aware of the security that it is purposefully bypassing. This person may try to manipulate the content or use overt communication that preserves privacy to avoid detection. A malicious insider is also a trusted person who knowingly discloses sensitive information. However, in addition to manipulating the content or using overt communication like an international insider, tunneled or covert communication is usually employed to avoid detection.
5. Data-Exfiltration
If we talk in terms of our general life, exfiltrate means to surreptitiously move personnel or material out an area under enemy control. In terms of computer science, data exfiltration is the unauthorized removal of data from a network. E.g. leakage of archives, passwords, additional malware and utilities, personally identifiable information, financial data, trade secrets, source code, intellectual property, etc. For a hacker, it is easy to move things in a box. E.g. RAR file, ZIP file, CAB file, etc. Data Exfiltration via outbound FTP, HTTPS is most common these days. Data-Exfiltration through Network medium is most common. For example, FTP, HTTP, SMTP, SSH, INSTANT MESSENGER, ROOTKITS, BATNETS, SPYWARE, COVERT CHANNELS, PHISHING, PHARMING, MITM are the medium by which data-exfiltration is quite simple for insider. Exploits, Privilege escalation, DNS Poisoning, Directory traversal are common attack done by an insider. If we talk about Physical medium of data-exfiltration is also play a major role. For example, printing device, CD, DVD, DISC, USP, LAPTOP, etc. are the common medium used by an insider for data-exfiltration.
6. The solution to detecting Data – Exfiltration over the Network
Presenting a behavior approach based on a Chi-Square model to detect insider attack specifically, data-exfiltration. The approach focuses on both technical as well as behavioral aspects and present methods to detect data-exfiltration. Firstly during the learning phase, profile each host in a network and compute Chi-Square values individually for the system and network parameters. Secondly, during the detection phase, compute Chi-Square values for the identified parameters and then superimpose current Chi-Square values with the learned Chi-Square values to detect data-exfiltration over the network.
Aim to develop a method to identify abnormal cyber behavior by establishing user profiles based on cyber data that is collected from the user while active on a computer. To do this, implement learning algorithms to establish “normal” behavioral profiles for users. Monitor users for anomalies in their behavior patterns. It is important to state that our goal is to identify these abnormalities. For example, it can identify that a user is uploading an abnormally large amount of file from a directory that they do not typically access.
Fig. 1: Superimposed Chi-Square values of outgoing packets of live data (abnormal profile data) and Threshold_1, Threshold_2 and Threshold_3 of learned data (normal profile data) for user type 1, 2 and 3 respectively.
Fig. 2: Superimposed Chi-Square values of outgoing Bytes of live data (abnormal profile data) and Threshold_1, Threshold_2 and Threshold_3 of learned data (normal profile data) for user type 1, 2 and 3 respectively.