Understanding XML and JSON Injection

Web applications that process XML or JSON data often use this information in database queries. Without proper safeguards, attackers can inject malicious payloads into these data structures, manipulating SQL queries to steal data, bypass authentication, or execute unauthorized commands. These injection attacks exploit weak input handling, making input sanitization and secure coding practices critical defenses.

How Injection Attacks Work

The Attack Flow

Malicious Input Submission: Attackers craft payloads containing SQL fragments or special characters.
Unsanitized Parsing: The application processes XML/JSON data without validation or escaping.
Query Manipulation: The injected data alters the intended SQL query structure.
Unauthorized Execution: The database executes the modified query, leading to breaches.

Real-World Example

Consider a login system that builds SQL queries from JSON input:

{
  "username": "admin' OR '1'='1--",
  "password": "anything"
}

Resulting Query (Vulnerable):

SELECT * FROM users WHERE username = 'admin' OR '1'='1--' AND password = 'anything'

The OR '1'='1-- clause bypasses authentication by forcing the query to return all records.

Prevention Strategies

Core Defenses

Technique	Implementation Example	Effectiveness
Parameterized Queries	`query = "SELECT * FROM users WHERE username = ?"`	High
Input Validation	Reject inputs with `'` or `--`	Medium
Escaping Special Chars	Replace `'` with `\'` in user input	Medium
Schema Validation	Enforce strict JSON/XML schemas	High

Implementation Checklist

For XML:
- Use libraries with built-in protection (e.g., lxml in Python with defusedxml)
- Disable external entity processing (XXE prevention)
For JSON:
- Parse with strict mode enabled
- Validate against a predefined schema
For SQL:
- Always use prepared statements
- Limit database user permissions

Common Attack Vectors

XML-Specific Risks

XXE (XML External Entity): Injects external entity references to access files or network resources.
```
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<user>&xxe;</user>
```
XPath Injection: Manipulates XPath queries in XML-based authentication systems.

JSON-Specific Risks

Mass Assignment: Overwrites sensitive fields via JSON payloads.
```
{
  "username": "attacker",
  "isAdmin": true
}
```
JSON Hijacking: Exploits JavaScript array constructors to steal data.

Secure Coding Practices

Do’s and Don’ts

✅ Do:

Treat all user input as untrusted
Use ORM frameworks (e.g., SQLAlchemy, Hibernate)
Log and monitor suspicious input patterns

❌ Don’t:

Concatenate user input directly into queries
Rely solely on client-side validation
Assume APIs or microservices are safe from injection

Code Comparison

Vulnerable (PHP):

$query = "SELECT * FROM users WHERE username = '" . $_POST['username'] . "'";

Secure (PHP with PDO):

$stmt = $pdo->prepare("SELECT * FROM users WHERE username = :username");
$stmt->execute(['username' => $_POST['username']]);

Tools and Resources

Detection Tools

Static Analysis: SonarQube, Semgrep
Dynamic Testing: OWASP ZAP, Burp Suite
Schema Validators: JSON Schema Validator, XML Schema (XSD)

Learn More