API3:2019 Excessive Data Exposure

Introduction

API3:2019 Excessive Data Exposure

Threat agents/attack vectors	Security weakness	Impacts
The core of this vulnerability exists in the fact that we should never expose data to the user they should not see. If we do, this could lead to an attack where a malicious actor could sniff the traffic from our API's by performing a MitM attack and abuse this data. We will go into some more examples later on in this article so it will be more clear.	This vulnerability is so prevalent (place 3 in the top 10) because it's hard to detect. Automation is near useless here because robots can not distinguish between legitimate data and sensitive data without telling them exactly how the application should work. This is bad because API's are often implemented in a generic way, returning all data and expecting the client to filter it out.	When sensitive data is exposed as such, this could have a severe business impact depending on what data is being exposed. Due to the nature of the sensitive data, it is almost always going to cause great impact by definition.

What is Excessive Data Exposure?

An API is only supposed to return the required data to the front-end clients but sometimes developers will make a mistake or take the easy route and implement generic API's that return all data to the client. When these API's return too much data, we can speak of Excessive Data Exposure.

Example Attack Scenarios

A simple example we can give is an application which makes a call to grab the credit card details. The user does not see the CCV because it will be filtered out by the front-end client but the API still returns too much data.

Example:

GET /api/v1/cards?id=0

[
  {
    "CVV": "677", 
    "creditCard": "1234567901234", 
    "id": 0, 
    "user": "API", 
    "validUntil": "1992"
  }
]

As you can see here, we made the call to grab the credit card details and while the end user might not be able to see the CVV but since the API returns it, we are speaking of Excessive Data Expsoure.

Let's add another example to make things more clear. In this scenario we have a mobile application that sends a request to /api/articles/{articleId}/comments/{commentId} and gets metadata about the comment as well, including the author. However when the attacker is sniffing the data, he can also see PII data from the author.

GET /api/articles/5/comments/0

[
  {
    "comment": "1234567901234", 
    "id": 0, 
    "user": "testUser", 
    "user address": "testlane, testing - 340043 testing in testland", 
    "user email": "test@bla.com"
  }
]

Preventive Measures Against Excessive Data Exposure

We should never rely on the client to filter out data

We should review all the responses coming from the back-end to see if they include sensitive data

When exposing a new API endpoint, engineers should always be wondering who the consumers of that data will be and exactly what data they need

There are certain generic methods such as to_json() and to_string(). These will return all the data that is fed into the function and can produce undesirable effects. We should opt for only returning specific properties of an object and never the full object itself fed into a to_json() or to_string() function

All PII data your application works with should be classified and re-indexed on a regular basis. You should review all the API call responses and see if they do not contain any of this data without reason

As an extra layer of security we can implement a scheme-based response validation, we need to ensure this validation defines and enforces all the data that's returned by the API

Conclusion

The deceptive simple nature of this vulnerability makes it very easy to overlook and our automation is not very likely to pick this issue type up either so it's very easy to slip under the radar. It's highly recommended that you judge all data leaving API's on their sensitive nature and whether or not that data should be filtered client side of server side.