Data Loss Prevention (DLP)




For any businesses in the modern economy, information used in corporate IT systems in the form of digital data has become a critical intangible asset for their growth, sustainability, and competitiveness. Such information includes intellectual property, customer data, company’s financials and trade secrets, PII (Personally Identifiable Information) and PHI (Protected Health Information) of clients and employees, technology “know-how”, competitive intelligence, and many more types of meaningful knowledge. Data is very much “the blood” of corporate IT, and as a loss of blood is deadly dangerous for living organisms, so it is for businesses with leaks of data from the corporate environment and its users.

Data breach has been one of the biggest fears that organizations face today. Data Loss Prevention (DLP), first hit the market in 2006 and gained some popularity in early part of 2007. Just as we have witnessed the growth of firewalls, intrusion detection systems (IDS) and numerous security products, DLP has already improved considerably and is beginning to influence the security industry.

Data loss prevention (DLP) is a strategy for making sure that end users do not send sensitive or critical information outside the corporate network. The term is also used to describe software products that help a network administrator control what data end users can transfer. DLP is typically defined as any solution or process that identifies confidential data, tracks that data as it moves through and out of the enterprise and prevents unauthorized disclosure of data by creating and enforcing disclosure policies.

Data loss prevention (DLP), per Gartner, may be defined as technologies which perform both content inspection and contextual analysis of data sent via messaging applications such as email and instant messaging, in motion over the network, in use on a managed endpoint device, and at rest in on-premises file servers or in cloud applications and cloud storage. These solutions execute responses based on policy and rules defined to address the risk of inadvertent or accidental leaks, or exposure of sensitive data outside authorized channels.

DLP solutions protect sensitive data and provide insight into the use of content within the enterprise. Few enterprises classify data beyond that which is public, and everything else. DLP helps organizations better understand their data and improved their ability to classify and manage content.

Since confidential data can reside on a variety of computing devices (physical servers, virtual servers, databases, file servers, PCs, point-of-sale devices, flash drives and mobile devices) and move through a variety of network access points (wireline, wireless, VPNs, etc.), there are a variety of solutions that are tackling the problem of data loss, data recovery and data leaks. As the number of internet-connected devices skyrockets into the billions, data loss prevention is an increasingly important part of any organization’s ability to manage and protect critical and confidential information. Examples of critical and confidential data types include:
  • Intellectual Property: source code, product design documents, process documentation, internal price lists
  • Corporate Data: Financial documents, strategic planning documents, due diligence research for mergers and acquisitions, employee information
  • Customer Data: Social Security numbers, credit card numbers, medical records, financial statements

DLP Features vs. DLP Solutions

The DLP market is also split between DLP as a feature, and DLP as a solution. A number of products, particularly email security solutions, provide basic DLP functions, but aren't complete DLP solutions. The difference is:
        A DLP Product includes centralized management, policy creation, and enforcement workflow, dedicated to the monitoring and protection of content and data. The user interface and functionality are dedicated to solving the business and technical problems of protecting content through content awareness.
        DLP Features include some of the detection and enforcement capabilities of DLP products, but are not dedicated to the task of protecting content and data.

This distinction is important because DLP products solve a specific business problem that may or may not be managed by the same business unit or administrator responsible for other security functions. We often see non-technical users such as legal or compliance officers responsible for the protection of content. Even human resources is often involved with the disposition of DLP alerts. Some organizations find that the DLP policies themselves are highly sensitive or need to be managed by business unit leaders outside of security, which also may argue for a dedicated solution. Because DLP is dedicated to a clear business problem (protect my content) that is differentiated from other security problems (protect my PC or protect my network) most of you should look for dedicated DLP solutions.

The last thing to remember about DLP is that it is highly effective against bad business processes (FTP exchange of unencrypted medical records with your insurance company, for example) and mistakes. While DLP offers some protection against malicious activity, we're at least a few years away from these tools protecting against knowledgeable attackers.

How DLP works: Standalone vs. integrated

DLP products are designed to detect sensitive information as it is accessed by endpoint devices like desktops and mobile devices, as it lies dormant on a file server in forgotten documents, and as it moves through an organization's networks using any number of protocols. DLP tools address the problems of sensitive data usage, movement and storage based on an organization's understanding of what it wants to protect and where the data is allowed at any moment.
Standalone DLP products can reside on specialized appliances or can be sold as software to be installed on the enterprise's own hardware. They are specialized and only address data loss prevention. A full soup-to-nuts DLP product monitors data at rest using a file scanning engine. It also features a network appliance to monitor data in transit over a company’s network on many network protocols.
An endpoint agent detects sensitive information in memory, during printing attempts, copying to portable media or exiting through network protocols. The agents may also be able to detect sensitive information at rest by scanning files found on endpoint logical drives.
Standalone DLP products also provide some manner of management console, a report generator, a policy manager, a database to store significant events and a quarantine server or folder to store captured sensitive data. There is also usually a method to build custom detection policies.
Integrated DLP features, by contrast to standalone DLP, are usually found on perimeter security gateways such as Web or email security gateways, intrusion detection systems/intrusion prevention systems, endpoint security suites and unified threat management products. Depending on their main functions, these products are most useful at detecting sensitive data in motion and sensitive data in use. Vulnerability scanners, for example, usually have DLP plug-ins to detect sensitive data at rest, such as Social Security numbers.
Unlike the convenience of having a standalone DLP product, security products with integrated DLP from different vendors do not share the same management consoles, policy management engines and data storage. That means an organization's DLP capability may end up being scattered among several different types of security products. Quarantine functions, if they exist, are handled through different management interfaces as well. Any attempt to correlate DLP events will have to be handled through a security information management (SIEM) system or a separate data correlation engine.

Content vs. Context

We need to distinguish content from context. One of the defining characteristics of DLP solutions is their content awareness. This is the ability of products to analyze deep content using a variety of techniques, and is very different from analyzing context. It's easiest to think of content as a letter, and context as the envelope and environment around it. Context includes things like source, destination, size, recipients, sender, header information, metadata, time, format, and anything else short of the content of the letter itself. Context is highly useful and any DLP solution should include contextual analysis as part of an overall solution.

A more advanced version of contextual analysis is business context analysis, which involves deeper analysis of the content, its environment at the time of analysis, and the use of the content at that time.

Content Analysis

The first step in content analysis is capturing the envelope and opening it. The engine then needs to parse the context (we'll need that for the analysis) and dig into it. For a plain text email this is easy, but when you want to look inside binary files it gets a little more complicated. All DLP solutions solve this using file cracking. File cracking is the technology used to read and understand the file, even if the content is buried multiple levels down. For example, it's not unusual for the cracker to read an Excel spreadsheet embedded in a Word file that's zipped. The product needs to unzip the file, read the Word doc, analyze it, find the Excel data, read that, and analyze it. Some tools support analysis of encrypted data if enterprise encryption is used with recovery keys, and most tools can identify standard encryption and use that as a contextual rule to block/quarantine content.

Content Analysis Techniques


Once the content is accessed, there are seven major analysis techniques used to find policy violations, each with its own strengths and weaknesses.

1. Rule-Based/Regular Expressions: This is the most common analysis technique available in both DLP products and other tools with DLP features. It analyzes the content for specific rules — such as 16 digit numbers that meet credit card checksum requirements, medical billing codes, or other textual analyses. Most DLP solutions enhance basic regular expressions with their own additional analysis rules (e.g., a name in proximity to an address near a credit card number).

What it's best for: As a first-pass filter, or for detecting easily identified pieces of structured data like credit card numbers, social security numbers, and healthcare codes/records.

Strengths: Rules process quickly and can be easily configured. Most products ship with initial rule sets. The technology is well understood and easy to incorporate into a variety of products.

Weaknesses: Prone to high false positive rates. Offers very little protection for unstructured content like sensitive intellectual property.

2. Database Fingerprinting: Sometimes called Exact Data Matching. This technique takes either a database dump or live data (via ODBC connection) from a database and only looks for exact matches. For example, you could generate a policy to look only for credit card numbers in your customer base, thus ignoring your own employees buying online. More advanced tools look for combinations of information, such as the magic combination of first name or initial, with last name, with credit card or social security number that triggers a California SB 1386 (California S.B. 1386 was a bill passed by the California legislature that amended civil codes 1798.29, 1798.82 and 1798.84, the California law regulating the privacy of personal information) disclosure. Make sure you understand the performance and security implications of nightly extracts vs. live database connections.

What it's best for: Structured data from databases.

Strengths: Very low false positives (close to 0). Allows you to protect customer/sensitive data while ignoring other, similar, data used by employees (like their personal credit cards for online orders).

Weaknesses: Nightly dumps won't contain transaction data since the last extract. Live connections can affect database performance. Large databases affect product performance.

3. Exact File Matching: With this technique you take a hash of a file and monitor for any files that match that exact fingerprint. Some consider this to be a contextual analysis technique since the file contents themselves are not analyzed.

What it's best for: Media files and other binaries where textual analysis isn't necessarily possible.

Strengths: Works on any file type, low false positives with a large enough hash value (effectively none).

Weaknesses: Trivial to evade. Worthless for content that's edited, such as standard office documents and edited media files.

4. Partial Document Matching: This technique looks for a complete or partial match on protected content. Thus you could build a policy to protect a sensitive document, and the DLP solution will look for either the complete text of the document, or even excerpts as small as a few sentences. For example, you could load up a business plan for a new product and the DLP solution would alert if an employee pasted a single paragraph into an Instant Message. Most solutions are based on a technique known as cyclical hashing, where you take a hash of a portion of the content, offset a predetermined number of characters, then take another hash, and keep going until the document is completely loaded as a series of overlapping hash values. Outbound content is run through the same hash technique, and the hash values compared for matches. Many products use cyclical hashing as a base, then add more advanced linguistic analysis.

What it's best for: Protecting sensitive documents, or similar content with text such as CAD files (with text labels) and source code. Unstructured content that's known to be sensitive.

Strengths: Ability to protect unstructured data. Generally low false positives (some vendors will say zero false positives, but any common sentence/text in a protected document can trigger alerts). Doesn't rely on complete matching of large documents; can find policy violations on even a partial match.

Weaknesses: Performance limitations on the total volume of content that can be protected. Common phrases/verbiage in a protected document may trigger false positives. Must know exactly which documents you want to protect. Trivial to avoid (ROT 1 encryption is sufficient for evasion).

5. Statistical Analysis: Use of machine learning, Bayesian analysis, and other statistical techniques to analyze a corpus of content and find policy violations in content that resembles the protected content. This category includes a wide range of statistical techniques which vary greatly in implementation and effectiveness. Some techniques are very similar to those used to block spam.

What it's best for: Unstructured content where a deterministic technique, like partial document matching, would be ineffective. For example, a repository of engineering plans that's impractical to load for partial document matching due to high volatility or massive volume.

Strengths: Can work with more nebulous content where you may not be able to isolate exact documents for matching. Can enforce policies such as "alert on anything outbound that resembles the documents in this directory".

Weaknesses: Prone to false positives and false negatives. Requires a large corpus of source content — the bigger the better.

6. Conceptual/Lexicon: This technique uses a combination of dictionaries, rules, and other analyses to protect nebulous content that resembles an "idea". It's easier to give an example — a policy that alerts on traffic that resembles insider trading, which uses key phrases, word counts, and positions to find violations. Other examples are sexual harassment, running a private business from a work account, and job hunting.

What it's best for: Completely unstructured ideas that defy simple categorization based on matching known documents, databases, or other registered sources.

Strengths: Not all corporate policies or content can be described using specific examples; Conceptual analysis can find loosely defined policy violations other techniques can't even think of monitoring for.

Weaknesses: In most cases these are not user-definable and the rule sets must be built by the DLP vendor with significant effort (costing more). Because of the loose nature of the rules, this technique is very prone to false positives and false negatives.

7. Categories: Pre-built categories with rules and dictionaries for common types of sensitive data, such as credit card numbers/PCI protection, HIPAA, etc.

What it's best for: Anything that neatly fits a provided category. Typically easy to describe content related to privacy, regulations, or industry-specific guidelines.

Strengths: Extremely simple to configure. Saves significant policy generation time. Category policies can form the basis for more advanced, enterprise specific policies. For many organizations, categories can meet a large percentage of their data protection needs.

Weaknesses: One size fits all might not work. Only good for easily categorized rules and content.

These 7 techniques form the basis for most of the DLP products on the market. Not all products include all techniques, and there can be significant differences between implementations. Most products can also chain techniques — building complex policies from combinations of content and contextual analysis techniques.

Protecting Data in Motion, At Rest, and In Use

The goal of DLP is to protect content throughout its lifecycle. In terms of DLP, this includes three major aspects:
  Data at Rest: includes scanning of storage and other content repositories to identify where sensitive content is located. We call this content discovery. For example, you can use a DLP product to scan your servers and identify documents with credit card numbers. If the server isn't authorized for that kind of data, the file can be encrypted or removed, or a warning sent to the file owner.
  Data in Motion: is sniffing of traffic on the network (passively or inline via proxy) to identify content being sent across specific communications channels. For example, this includes sniffing emails, instant messages, and web traffic for snippets of sensitive source code. In motion tools can often block based on central policies, depending on the type of traffic.
  Data in Use: is typically addressed by endpoint solutions that monitor data as the user interacts with it. For example, they can identify when you attempt to transfer a sensitive document to a USB drive and block it (as opposed to blocking use of the USB drive entirely). Data in use tools can also detect things like copy and paste, or use of sensitive data in an unapproved application (such as someone attempting to encrypt data to sneak it past the sensors).

Data in Motion

Many organizations first enter the world of DLP with network based products that provide broad protection for managed and unmanaged systems. It’s typically easier to start a deployment with network products to gain broad coverage quickly. Early products limited themselves to basic monitoring and alerting, but all current products include advanced capabilities to integrate with existing network infrastructure and provide protective, not just detective, controls.

Network Monitor


At the heart of most DLP solutions lies a passive network monitor. The network monitoring component is typically deployed at or near the gateway on a SPAN port (or a similar tap). It performs full packet capture, session reconstruction, and content analysis in real time. Performance is more complex and subtle than vendors normally discuss. You might have to choose between pre-filtering (and thus missing non-standard traffic) or buying more boxes and load balancing. Also, some products lock monitoring into pre-defined port and protocol combinations, rather than using service/channel identification based on packet content. Even if full application channel identification is included, you want to make sure it's enabled. Otherwise, you might miss non-standard communications such as connecting over an unusual port. Most of the network monitors are dedicated general-purpose server hardware with DLP software installed. A few vendors deploy true specialized appliances. While some products have their management, workflow, and reporting built into the network monitor, this is often offloaded to a separate server or appliance.

Email Integration


The next major component is email integration. Since email is store and forward you can gain a lot of capabilities, including quarantine, encryption integration, and filtering, without the same hurdles to avoid blocking synchronous traffic. Most products embed an MTA (Mail Transport Agent) into the product, allowing you to just add it as another hop in the email chain. Quite a few also integrate with some of the major existing MTAs/email security solutions directly for better performance. One weakness of this approach is it doesn't give you access to internal email. If you're on an Exchange server, internal messages never make it through the external MTA since there's no reason to send that traffic out. To monitor internal mail you'll need direct Exchange/Lotus integration, which is surprisingly rare in the market. Full integration is different from just scanning logs/libraries after the fact, which is what some companies call internal mail support. Good email integration is absolutely critical if you ever want to do any filtering, as opposed to just monitoring.

Filtering/Blocking and Proxy Integration

Nearly anyone deploying a DLP solution will eventually want to start blocking traffic. There's only so long you can take watching all your juicy sensitive data running to the nether regions of the Internet before you start taking some action. But blocking isn't the easiest thing in the world, especially since we're trying to allow good traffic, only block bad traffic, and make the decision using real-time content analysis. Email, as we just mentioned, is fairly straightforward to filter. It's not quite real-time and is proxied by its very nature. Adding one more analysis hop is a manageable problem in even the most complex environments. Outside of email most of our communications traffic is synchronous — everything runs in real time. Thus if we want to filter it we either need to bridge the traffic, proxy it, or poison it from the outside.

Bridge
With a bridge we just have a system with two network cards which performs content analysis in the middle. If we see something bad, the bridge breaks the connection for that session. Bridging isn't the best approach for DLP since it might not stop all the bad traffic before it leaks out. It's like sitting in a doorway watching everything go past with a magnifying glass; by the time you get enough traffic to make an intelligent decision, you may have missed the really good stuff. Very few products take this approach, although it does have the advantage of being protocol agnostic.

Proxy
In simplified terms, a proxy is protocol/application specific and queues up traffic before passing it on, allowing for deeper analysis. We see gateway proxies mostly for HTTP, FTP, and IM protocols. Few DLP solutions include their own proxies; they tend to integrate with existing gateway/proxy vendors since most customers prefer integration with these existing tools. Integration for web gateways is typically through the iCAP protocol, allowing the proxy to grab the traffic, send it to the DLP product for analysis, and cut communications if there's a violation. This means you don't have to add another piece of hardware in front of your network traffic and the DLP vendors can avoid the difficulties of building dedicated network hardware for inline analysis. If the gateway includes a reverse SSL proxy you can also sniff SSL connections. You will need to make changes on your endpoints to deal with all the certificate alerts, but you can now peer into encrypted traffic. For Instant Messaging you'll need an IM proxy and a DLP product that specifically supports whatever IM protocol you're using.

TCP Poisoning
The last method of filtering is TCP poisoning. You monitor the traffic and when you see something bad, you inject a TCP reset packet to kill the connection. This works on every TCP protocol but isn't very efficient. For one thing, some protocols will keep trying to get the traffic through. If you TCP poison a single email message, the server will keep trying to send it for 3 days, as often as every 15 minutes. The other problem is the same as bridging — since you don't queue the traffic at all, by the time you notice something bad it might be too late. It's a good stop-gap to cover nonstandard protocols, but you'll want to proxy as much as possible.

Internal Networks

Although technically capable of monitoring internal networks, DLP is rarely used on internal traffic other than email. Gateways provide convenient choke points; internal monitoring is a daunting prospect from cost, performance, and policy management/false positive standpoints. A few DLP vendors have partnerships for internal monitoring but this is a lower priority feature for most organizations.

Distributed and Hierarchical Deployments

All medium to large enterprises, and many smaller organizations, have multiple locations and web gateways. A DLP solution should support multiple monitoring points, including a mix of passive network monitoring, proxy points, email servers, and remote locations. While processing/analysis can be offloaded to remote enforcement points, they should send all events back to a central management server for workflow, reporting, investigations, and archiving. Remote offices are usually easy to support since you can just push policies down and reporting back, but not every product has this capability. The more advanced products support hierarchical deployments for organizations that want to manage DLP differently in multiple geographic locations, or by business unit. International companies often need this to meet legal monitoring requirements which vary by country. Hierarchical management supports coordinated local policies and enforcement in different regions, running on their own management servers, communicating back to a central management server. Early products only supported one management server but now we have options to deal with these distributed situations, with a mix of corporate/regional/business unit policies, reporting, and workflow.

Data at Rest

While catching leaks on the network is fairly powerful, it's only one small part of the problem. Many customers are finding that it's just as valuable, if not more valuable, to figure out where all that data is stored in the first place. We call this content discovery. The biggest advantage of content discovery in a DLP tool is that it allows you to take a single policy and apply it across data no matter where it's stored, how it's shared, or how it's used. For example, you can define a policy that requires credit card numbers to only be emailed when encrypted, never be shared via HTTP or HTTPS, only be stored on approved servers, and only be stored on workstations/laptops by employees on the accounting team. All of this can be specified in a single policy on the DLP management server.


Content discovery consists of three components:
1.    Endpoint Discovery: scanning workstations and laptops for content.
2.    Storage Discovery: scanning mass storage, including file servers, SAN, and NAS.
3.    Server Discovery: application-specific scanning of stored data on email servers, document management systems, and databases (not currently a feature of most DLP products, but beginning to appear in some Database Activity Monitoring products).

Content Discovery Techniques

There are three basic techniques for content discovery:
1.    Remote Scanning: a connection is made to the server or device using a file sharing or application protocol, and scanning is performed remotely. This is essentially mounting a remote drive and scanning it from a server that takes policies from, and sends results to, the central policy server. For some vendors this is an appliance, for others it's a commodity server, and for smaller deployments it's integrated into the central management server.
2.    Agent-Based Scanning: an agent is installed on the system (server) to be scanned and scanning is performed locally. Agents are platform specific, and use local CPU cycles, but can potentially perform significantly faster than remote scanning, especially for large repositories. For endpoints, this should be a feature of the same agent used for enforcing Data in Use controls.
3.    Memory-Resident Agent Scanning: Rather than deploying a full-time agent, a memory-resident agent is installed, performs a scan, then exits without leaving anything running or stored on the local system. This offers the performance of agent-based scanning in situations where you don't want an agent running all the time.

Any of these technologies can work for any of the modes, and enterprises will typically deploy a mix depending on policy and infrastructure requirements. We currently see technology limitations with each approach which guide deployment:
     Remote scanning can significantly increase network traffic and has performance limitations based on network bandwidth and target and scanner network performance. Some solutions can only scan gigabytes per day (sometimes hundreds, but not terabytes per day), per server based on these practical limitations, which may be inadequate for very large storage.
     Agents, temporal or permanent, are limited by processing power and memory on the target system, which often translates to restrictions on the number of policies that can be enforced, and the types of content analysis that can be used. For example, most endpoint agents are not capable of partial document matching or database fingerprinting against large data sets. This is especially true of endpoint agents which are more limited.
     Agents don't support all platforms.

Data at Rest Enforcement

Once a policy violation is discovered, the DLP tool can take a variety of actions:
     Alert/Report: create an incident in the central management server just like a network violation.
     Warn: notify the user via email that they may be in violation of policy.
     Quarantine/Notify: move the file to the central management server and leave a text file with instructions on how to request recovery of the file.
     Quarantine/Encrypt: encrypt the file in place, usually leaving a plain text file describing how to request decryption.
     Quarantine/Access Control: change access controls to restrict access to the file.
     Remove/Delete: either transfer the file to the central server without notification, or just delete it.
    The combination of different deployment architectures, discovery techniques, and enforcement options creates a powerful combination for protecting data at rest and supporting compliance initiatives. For example, we're starting to see increasing deployments of CMF to support PCI compliance — more for the ability to ensure (and report) that no cardholder data is stored in violation of PCI than to protect email or web traffic.

Data in Use

Network monitoring is non-intrusive (unless you have to crack SSL) and offers visibility to any system on the network, managed or unmanaged, server or workstation. It covers all systems connected to the network. It doesn't protect data when someone walks out the door with a laptop, and can't even prevent people from copying data to portable storage like USB drives. To move from a "leak prevention" solution to a "content protection" solution, products need to expand not only to stored data, but to the endpoints where data is used.

Adding an endpoint agent to a DLP solution not only gives you the ability to discover stored content, but to potentially protect systems no longer on the network or even protect data as it's being actively used. While extremely powerful, it has been problematic to implement. Agents need to perform within the resource constraints of a standard laptop while maintaining content awareness. This can be difficult if you have large policies such as, "protect all 10 million credit card numbers from our database", as opposed to something simpler like, "protect any credit card number" that will generate false positives every time an employee visits Amazon.com.

Key Capabilities

Existing products vary widely in functionality, but we can break out three key capabilities:
1. Monitoring and enforcement within the network stack: This allows enforcement of network rules without a network appliance. The product should be able to enforce the same rules as if the system were on the managed network, as well as separate rules designed only for use on unmanaged networks.
2. Monitoring and enforcement within the system kernel: By plugging directly into the operating system kernel you can monitor user activity, such as copying and pasting sensitive content. This can also allow products to detect (and block) policy violations when the user is taking sensitive content and attempting to hide it from detection, perhaps by encrypting it or modifying source documents.
3. Monitoring and enforcement within the file system: This allows monitoring and enforcement based on where data is stored. For example, you can perform local discovery and/or restrict transfer of sensitive content to unencrypted USB devices.

These options are simplified, and most early products focus on 1 and 3 to solve the portable storage problem and protect devices on unmanaged networks. System/kernel integration is much more complex and there are a variety of approaches to gaining this functionality.

Use Cases

Endpoint DLP is evolving to support a few critical use cases:
     Enforcing network rules off the managed network, or modifying rules for more hostile networks.
     Restricting sensitive content from portable storage, including USB drives, CD/DVD drives, home storage, and devices like smartphones and PDAs.
     Restricting copy and paste of sensitive content.
     Restricting applications allowed to use sensitive content — e.g., only allowing encryption with an approved enterprise solution, not tools downloaded online that don't allow enterprise data recovery.
     Integration with Enterprise Digital Rights Management to automatically apply access control to documents based on the included content.
     Auditing use of sensitive content for compliance reporting.

Additional Endpoint Capabilities

The following features are highly desirable when deploying DLP at the endpoint:
  Endpoint agents and rules should be centrally managed by the same DLP management server that controls data in motion and data at rest (network and discovery).
  Policy creation and management should be fully integrated with other DLP policies in a single interface.
  Incidents should be reported to, and managed by, a central management server.
  Endpoint agent should use the same content analysis techniques and rules as the network servers/appliances.
  Rules (policies) should adjust based on where the endpoint is located (on or off the network). When the endpoint is on a managed network with gateway DLP, redundant local rules should be skipped to improve performance.
  Agent deployment should integrate with existing enterprise software deployment tools.
  Policy updates should offer options for secure management via the DLP management server, or existing enterprise software update tools.

Endpoint Limitations

Realistically, the performance and storage limitations of the endpoint will restrict the types of content analysis supported and the number and type of policies that are enforced locally. For some enterprises this might not matter, depending on the kinds of policies to be enforced, but in many cases endpoints impose significant constraints on data in use policies.

Data Loss/ Data Leak technologies category

DLP technologies are broadly divided into two categories – Enterprise DLP and Integrated DLP.
Enterprise DLP solutions are comprehensive and packaged in agent software for desktops and servers, physical and virtual appliances for monitoring networks and email traffic, or soft appliances for data discovery.
Integrated DLP is limited to secure web gateways (SWGs), secure email gateways (SEGs), email encryption products, enterprise content management (ECM) platforms, data classification tools, data discovery tools and cloud access security brokers (CASBs).
DLP also can be classified as -
Network-based data loss prevention (DLP) solutions are focused on protecting data while it is in motion. These data loss prevention solutions are installed at the "perimeter" of enterprise networks. They monitor network traffic to detect sensitive data that is being leaked or sent out of the enterprise. Solutions may investigate email traffic, instant messaging, social media interactions, web 2.0 applications, SSL traffic and more. Their analysis engines are looking for violations of predefined information disclosure policies, such as data leaks.
Datacenter or storage-based data loss prevention (DLP) solutions focus on protecting data at rest within an organization’s datacenter infrastructure, such as file servers, SharePoint and databases. These data loss prevention solutions discover where confidential data resides and enable users to determine if it's being stored securely. When confidential information resides on insecure platforms, it is usually an indication of problematic business processes or poorly executed data retention policies.
End-point based data loss prevention (DLP) solutions focus on monitoring PC-based systems (laptops, tablets, POS, etc.) for all actions such as print or transfer to CD/DVD, webmail, social media, USB and more. End-point based solutions are typically event driven in that the agent resident on the end-point is monitoring for specific user actions, such as sending an email, copying a file to a USB, leaking data or printing a file. These solutions can be configured for passive monitoring mode or to actively block specific types of activities. 

Content-aware data loss prevention (DLP) tools address the risk of accidental exposure of sensitive data outside authorized channels, using monitoring, blocking and remediation functionality. These tools enable the enforcement of company policies based on the classification of content. Data leak prevention technologies are being increasingly leveraged for data discovery and classification purposes.

Application Security and Your Data Loss Prevention Strategy

Use this checklist as a reference tool when making data loss prevention purchase decisions:
  • Develop clear data loss prevention strategies with concrete requirements before evaluating products.
  • Understand the limitations of data leak prevention. As an example, data loss prevention is a data-centric control and does not have any understanding of SQL.
  • Applications protect your data. Test the security quality of your applications. Use application security testing as a way of protecting data.
  • Create data loss prevention policies and procedures for mobile devices as they interact with sensitive corporate data.

Consideration before DLP deployment

Before you start looking at any tools, you need to understand why you might need DLP, how you plan on using it, and the business processes around creating policies and managing incidents.

1.    Identify business units that need to be involved and create a selection committee: We tend to include two kinds of business units in the DLP selection process — content owners with sensitive data to protect, and content protectors with the responsibility for enforcing controls over the data. Content owners include those business units that hold and use the data. Content protectors tend to include departments like Human Resources, IT Security, corporate Legal, Compliance, and Risk Management. Once you identify the major stakeholders, you'll want to bring them together for the next few steps.
2.    Define what you want to protect: Start by listing out the kinds of data, as specifically as possible, that you plan on using DLP to protect. We typically break content out into three categories — personally identifiable information (PII, including healthcare, financial, and other data), corporate financial data, and intellectual property. The first two tend to be more structured and will drive you towards certain solutions, while IP tends to be less structured, with different content analysis requirements. Even if you want to protect all kinds of content, use this process to specify and prioritize, preferably “on paper”.
3.    Decide how you want to protect it and set expectations: In this step you will answer two key questions. First, in what channels/phases do you want to protect the data? This is where you decide if you just want basic email monitoring, or if you want comprehensive data in motion, data at rest, and data in use protection. You should be extremely specific, listing out major network channels, data stores, and endpoint requirements. The second question is what kind of enforcement do you plan on implementing? Monitoring and alerting only? Email filtering? Automatic encryption? You'll get a little more specific in the formal requirements later, but you should have a good idea of your expectations at this point. Also, don't forget that needs change over time, so we recommend you break requirements into short term (within 6 months of deployment), mid-term (12-18 months after deployment), and long-term (up to 3 years after deployment).


4.    Outline process workflow: One of the biggest stumbling blocks for DLP deployments is failure to prepare the enterprise. In this stage you define your expected workflows for creating new protection policies and handling incidents involving insiders and external attackers. Which business units are allowed to request protection of data? Who is responsible for building the policies? When a policy is violated, what's the workflow to remediate it? When is HR notified? Legal? Who handles day-to-day policy violations? Is it a technical security role, or non-technical, such as a compliance officer? The answers to these kinds of questions will guide you towards different solutions to meet your workflow needs.

By the completion of this phase you should have defined key stakeholders, convened a selection team, prioritized the data you want to protect, determined where you want to protect it, and roughed out workflow requirements for building policies and remediating incidents.


Internal Testing

Make sure you test the products as thoroughly as possible. A few key aspects to test, if you can, are:

  Policy creation and content analysis. Violate policies and try to evade or overwhelm the tool to learn where its limits are.
  Email integration.
  Incident workflow — Review the working interface with those employees who will be responsible for enforcement.
  Directory integration.
  Storage integration on major platforms to test performance and compatibility for data at rest protection.
  Endpoint functionality on your standard image.
  Network performance — not just bandwidth, but any requirements to integrate the product on your network and tune it. Do you need to pre-filter traffic? Do you need to specify port and protocol combinations?
  Network gateway integration.
  Enforcement actions.

Conclusion

Data Loss Prevention is a confusing market, but by understanding the capabilities of DLP tools and using a structured selection process you can choose an appropriate tool for your requirements. DLP is a very effective tool for preventing accidental disclosures and ending bad business processes around the use of sensitive data. While it offers some protection against malicious attacks, the market is still a couple years away from stopping knowledgeable bad guys.

DLP products may be adolescent, but they provide very high value for organizations that plan properly and understand how to take full advantage of them. Focus on those features that are most important to you as an organization, paying particular attention to policy creation and workflow, and work with key business units early in the process.

Most organizations that deploy DLP see significant risk reductions with few false positives and business interruptions.






Reference:

  • 1     https://www.veracode.com/security/guide-data-loss-prevention
    2      http://whatis.techtarget.com/definition/data-loss-prevention-DLP
    3      SANS Institute Infosec Reading Room
    4      https://www.skyhighnetworks.com/cloud-security-blog/how-data-loss-prevention-dlp-technology-works/
    5      https://en.wikipedia.org/wiki/Data_loss_prevention_software
    6      http://searchsecurity.techtarget.com/feature/Introduction-to-data-loss-prevention-products
    7      https://www.devicelock.com/products/
    8      Understanding and selecting a Data Loss Prevention Solution by Websense Inc. and released in cooperation with the SANS Institute.
    9      Data Loss Prevention by Prathaben Kanagasingham
1
 

Post a Comment

2 Comments

  1. The perfect data warehousing services provided by your company were developed, effectively tested and builder, which helped in managing the large quantity of transformed data comfortably.

    ReplyDelete
  2. Capsule theory is an excellent concept to talk about, but you can't ignore the relation of capsule theories withBig data consulting services.

    ReplyDelete