How to ensure data tokenization is truly secure

Data is everywhere. Alongside technology, data tokenization is helping solve the most complex, diverse, and dynamic problems in finance, manufacturing, healthcare, education, climate change, sustainability, economic growth, and business resilience. Nowadays, there is no industry that can afford to ignore the exponential value of data.

At the same time, data poses several security challenges. The fact that data is everywhere also means that the threat of an information breach is never far away. This is largely due to the fact that large financial institutions, healthcare providers, and public sector organizations, along with small businesses, institutions, hospitals, and even government and citizen services generate vast pools of data. As businesses register users for their applications, their databases are continuously capturing sensitive user details.

Governments have tightened regulations for data protection but the response from companies themselves has been inconsistent. While financial companies are guided by the Payment Card Industry Data Security Standard (PCI DSS) to protect the personal details of cardholders from misuse, and healthcare organizations are bound by HIPAA rules, smaller enterprises across a variety of industries are often lacking guidance under a recognized standard for data security and privacy.

Companies cannot afford to ignore the financial losses due to high-profile data breaches as these can run well into the millions of dollars. In today’s data-driven world, any information leak (malicious or unintentional) can hamper operations, cause financial losses, go viral on social media, and cause irreparable damage to customers’ trust and the company’s reputation. This is where data tokenization comes into the picture.

A non-sensitive token for every piece of sensitive data

By definition, tokenization refers to the replacement of existing confidential or sensitive data by associating it with a token. Tokens are mapped to the actual data so that sensitive data is secure and not exposed to the public or any party who is not concerned in any way with the actual value. For example, a token can represent a credit card number or Social Security number (SSN).

Tokenization is particularly useful in transactions such as credit card processing where using the actual data could threaten data privacy or expose sensitive data to unauthorized parties. It removes any connection between a transaction and the actual sensitive data, limiting exposure to breaches. For example, when you make a credit card payment using PayTM, you fill out the card expiry date whenever you make a transaction. A part of the detail is masked, and other portions are revealed to assure you that the right card has been used only by an authorized person.  

Types of data tokenization

At present, companies can tokenize in two ways. They might choose reversible tokenization where the token can be decrypted to its original form by using the encryption algorithms and encryption keys.

Alternatively, they can use irreversible tokenization using either hashing or encryption mechanisms where the original details cannot be retrieved. The approach used depends on the scenarios and use cases of tokenization. Hashing algorithms like MD5 simply convert the actual value to a static, standard value, which cannot be decrypted to retrieve the actual value.  For example, about 1GB of actual data is converted into a 16-digit code, which cannot be converted back to that 1GB file. When an entity needs to retrieve the token for a particular request and response, the server gets a request. Irreversible tokenization is particularly applicable if this information has to be exchanged with a third party (say, a partner) who doesn’t care about the information contained in the token itself but wants to understand whether the token is valid and associated with a particular user. The partner only wants high-level details of the token.

What’s better – reversible or irreversible data tokenization?

Experts recommend irreversible tokenization as a go-to approach because only the entity implementing the tokenization has the associated value while the entity/partner who needs to consume this data doesn’t really want to see the actual details embedded in that token. This reduces the attack surface. On the other hand, reversible tokens may be preferable in scenarios where the partner needs to verify the data and therefore needs to see the actual data associated with that token. Here the encryption key needs to be shared with the third party. Transactions between both entities (sender and receiver) involve accessing the encryption keys and thus pose an additional security risk.

The rising value of tokenization

Unsurprisingly, tokenization solutions that do not rely on encryption keys which have the potential to become an area of vulnerability, are becoming a popular strategy to stem the threat of security breaches by companies. As highly sensitive data is replaced with a unique value or numeric sequence in the form of tokens, attackers will not be able to uncover the original data even if they lay their hands on the tokens or transactional data. This can go a long way towards minimizing the attack surface, and the damage caused by data breaches, and deterring hackers in the future.

However, companies could face challenges if tokenization is not properly implemented.

Typically, tokens are tagged as an alias to original data that is stored in a database – in itself, a cause of vulnerability. But if the source of the actual data is not secured by industry norms, it becomes an easy target for attackers who can extract the original values if the database isn’t encrypted or isolated properly from the internet and its servers are not secure. Such a situation renders tokenization ineffective.

Top five expert-recommended practices to ensure tokenization implementation is effective

It’s important that companies look at tokenization in a strategic, planned manner that takes care of various contingencies. Listed below are the top five expert-recommended practices to ensure tokenization implementation is truly effective:

1. Implement mechanisms that make it extremely difficult for an attacker to identify or get to the source:

·   It’s recommended that this is made as abstract as possible, by hiding extremely sensitive data based on their industry or user privacy requirements.

·   Build tokenization solutions so that tokens are exposed instead of the actual sensitive data.

·   Logically isolate the tokenization system from the data processing systems and applications that process or store the actual sensitive data.

·   Ensure encryption of original values in the database and implement standard protocols of data protection and servers.

Tokens are located in a hybrid environment alongside data centers and private, public and community clouds. Actual/real data exists on a particular instance of numerous scenarios that can occur for an organization. The attacker has to identify that particular instance, get past all the entities, and identify the vulnerabilities, making it both tedious and complex. This is the real impact of tokenization in the prevention of data leaks and reducing the attack surface.

2.     Be secure by design:

This aligns with OWASP’s latest checklist of standards where it recommends design level activities and is now an integral part of penetration testing activities. Payment apps are setting a good example of being secure by design because even the cardholders themselves don’t see complete cardholder information but only a masked value. Based on the context and request, servers abstract the data and only serve the required content without exposing it entirely. The process of accessing a token is also made far more difficult for attackers as they’ll be required to provide authentication, contextual information, and a source IP address to receive a response from the server.

3.     Implement a centralized solution that is easy to scale:

As organizations attempt to scale their tokenization solutions, a centralized solution will save them the trouble of having to define each and every pattern separately.

Cloud access security brokers (CASB’s) are an area where companies have matured in the use of tokenization features where their teams are required to share personally identifiable information. Highly confidential data, which should not be shared with an anonymous user, is picked up by the centrally located CASB solution and patterns of data sensitivity are identified to enable the right decisions. This requires proper configuration by the admin teams because although some predefined parameters/values like PAN, CVV, etc are obviously sensitive, the solution may not always be aware of the significance of a particular value/parameter. Admin teams must identify parameters and establish rules defining boundaries or regions within which that information may be exchanged between specific IP addresses. In case there is an attempt to transfer this data, then tokenization will be enforced as a standard policy.

These solutions also give companies improved visibility into the data that is being exchanged, where it’s being transferred to, and from where they’re receiving the data. These inputs as well as risk patterns are captured on a dashboard to help companies analyze their situation and take the necessary actions in real-time.

4. Get expert help for implementing tokenization:

While enterprises usually outsource this work to a highly mature security vendor, small and medium businesses may find it challenging to follow suit as tokenization is a new service and feel many will be forced to do it themselves due to budget constraints. However, the likelihood of errors will be substantially higher as internal teams may not fully understand tokenization and thus conduct implementations that fall short of expected security standards.

5. Plan regular security audits and penetration testing engagements:

Tokenization is best left to cryptography experts for various reasons. Firstly, they come with a vast amount of knowledge and extensive experience of security aspects along with an understanding of when to apply specific encrypting or hashing algorithms. Even if the process of tokenization is executed internally, enlisting the help of highly skilled experts will bridge any gaps. A standard developer may not have the wherewithal to verify the tokenization mechanism, and determine its suitability for each particular scenario. During penetration testing, it’s important to understand whether the auditing teams can decipher the tokens.

As an MSSP and auditor, Entersoft’s team of cryptography experts can play a crucial role in enhancing the effectiveness of security strategies such as tokenization. Our penetration testing engagement chiefly focuses on identifying where data is stored and retrieved and is a conversation-oriented activity during the design-level review stage to help us define the test cases. We check whether classification and prioritization of data, as well as security implementations, controls and practices for the actual data, are in place to counter any potential attacks. Approaching it from a design perspective, we identify the gaps in implementation, recommend tokenization solutions, and even prioritize data and apply specific controls to data of high sensitivity, when required.