API Integration Guide
Integrate PDF cleaning into your application with our REST API
Quick Start
1. Get your API token from the Admin Panel → Open API tab
2. Send a POST request with your PDF file:
curl -X POST https://your-domain.com/api/v1/clean \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -F "file=@document.pdf"
3. Poll the task status until completed:
curl https://your-domain.com/api/v1/task/{task_id} \
-H "Authorization: Bearer YOUR_API_TOKEN"
4. Download the cleaned PDF:
curl -o cleaned.pdf https://your-domain.com/api/v1/task/{task_id}/download \
-H "Authorization: Bearer YOUR_API_TOKEN"
Authentication
All API requests require a valid API token. Include it in one of these headers:
| Method | Header | Example |
|---|---|---|
| Bearer Token | Authorization | Bearer pk_live_abc123... |
| API Key | X-API-Key | pk_live_abc123... |
API Endpoints
/api/v1/clean
— Upload a PDF file for cleaning
Content-Type: multipart/form-data
| Parameter | Type | Default | Description |
|---|---|---|---|
| file | File | required | PDF file to process |
| remove_password | bool | true | Remove password protection |
| remove_watermark | bool | true | Remove watermarks |
| remove_ad_text | bool | true | Remove advertising text |
| remove_ad_images | bool | true | Remove advertising images |
| remove_header_footer | bool | true | Remove ad headers/footers |
| remove_background | bool | true | Remove background watermarks |
| remove_first_last_ad_pages | bool | true | Remove first/last ad pages |
| use_ocr | bool | true | Enable OCR recognition |
| use_ai_detection | bool | false | Enable AI-powered detection |
| process_mode | string | generate_new | generate_new or edit_original |
| extra_ad_keywords | string | null | Comma-separated extra ad keywords (appended to global config) |
| extra_watermark_patterns | string | null | Comma-separated extra watermark patterns |
| custom_rules | string | null | Comma-separated rule names to apply (empty = all active) |
| callback_url | string | null | Webhook URL for completion notification |
/api/v1/clean-url
— Clean a PDF from URL
Same parameters as /clean but use url field instead of file.
curl -X POST https://your-domain.com/api/v1/clean-url \ -H "Authorization: Bearer YOUR_TOKEN" \ -F "url=https://example.com/document.pdf"
/api/v1/task/{task_id}
— Get task status
Response includes download_url when status is completed.
{
"task_id": "abc123...",
"status": "completed", // pending | analyzing | processing | completed | failed
"progress": 100,
"file_name": "document.pdf",
"file_size": 1024000,
"download_url": "https://your-domain.com/api/v1/task/abc123/download",
"error_message": null
}
/api/v1/task/{task_id}/download
— Download cleaned PDF
Returns the cleaned PDF file as binary download.
Code Examples
Python
import requests, time
API_URL = "https://your-domain.com"
TOKEN = "pk_live_your_token_here"
HEADERS = {"Authorization": f"Bearer {TOKEN}"}
# Upload and clean a PDF file
with open("document.pdf", "rb") as f:
resp = requests.post(f"{API_URL}/api/v1/clean", headers=HEADERS, files={"file": f})
task = resp.json()
print(f"Task created: {task['task_id']}")
# Poll until complete
while task["status"] not in ("completed", "failed"):
time.sleep(2)
task = requests.get(f"{API_URL}/api/v1/task/{task['task_id']}", headers=HEADERS).json()
print(f" Status: {task['status']} Progress: {task['progress']}%")
# Download result
if task["status"] == "completed":
pdf = requests.get(task["download_url"], headers=HEADERS)
with open("cleaned.pdf", "wb") as f:
f.write(pdf.content)
print("Cleaned PDF saved!")
# --- Or clean from URL ---
resp = requests.post(f"{API_URL}/api/v1/clean-url", headers=HEADERS,
data={"url": "https://example.com/document.pdf"})
JavaScript (Node.js)
const fs = require('fs');
const FormData = require('form-data');
const axios = require('axios');
const API_URL = 'https://your-domain.com';
const TOKEN = 'pk_live_your_token_here';
const headers = { 'Authorization': `Bearer ${TOKEN}` };
async function cleanPDF(filePath) {
// Upload
const form = new FormData();
form.append('file', fs.createReadStream(filePath));
const { data: task } = await axios.post(`${API_URL}/api/v1/clean`, form,
{ headers: { ...headers, ...form.getHeaders() } });
console.log('Task:', task.task_id);
// Poll
let status = task;
while (!['completed', 'failed'].includes(status.status)) {
await new Promise(r => setTimeout(r, 2000));
status = (await axios.get(`${API_URL}/api/v1/task/${task.task_id}`, { headers })).data;
}
// Download
if (status.status === 'completed') {
const pdf = await axios.get(status.download_url, { headers, responseType: 'arraybuffer' });
fs.writeFileSync('cleaned.pdf', pdf.data);
}
}
cleanPDF('document.pdf');
cURL
# Upload file curl -X POST https://your-domain.com/api/v1/clean \ -H "Authorization: Bearer pk_live_your_token" \ -F "file=@document.pdf" \ -F "remove_watermark=true" \ -F "extra_ad_keywords=sponsor,advertisement" # Check status curl https://your-domain.com/api/v1/task/TASK_ID \ -H "Authorization: Bearer pk_live_your_token" # Download result curl -o cleaned.pdf https://your-domain.com/api/v1/task/TASK_ID/download \ -H "Authorization: Bearer pk_live_your_token" # Clean from URL curl -X POST https://your-domain.com/api/v1/clean-url \ -H "Authorization: Bearer pk_live_your_token" \ -F "url=https://example.com/document.pdf"
PHP
<?php
$apiUrl = 'https://your-domain.com';
$token = 'pk_live_your_token_here';
// Upload
$ch = curl_init("$apiUrl/api/v1/clean");
curl_setopt_array($ch, [
CURLOPT_POST => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => ["Authorization: Bearer $token"],
CURLOPT_POSTFIELDS => ['file' => new CURLFile('document.pdf')],
]);
$task = json_decode(curl_exec($ch), true);
curl_close($ch);
// Poll until done
do {
sleep(2);
$ch = curl_init("$apiUrl/api/v1/task/{$task['task_id']}");
curl_setopt_array($ch, [CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => ["Authorization: Bearer $token"]]);
$task = json_decode(curl_exec($ch), true);
curl_close($ch);
} while (!in_array($task['status'], ['completed', 'failed']));
// Download
if ($task['status'] === 'completed') {
file_put_contents('cleaned.pdf',
file_get_contents($task['download_url'], false,
stream_context_create(['http' => ['header' => "Authorization: Bearer $token"]])));
}
Java
import java.net.http.*;
import java.nio.file.*;
var client = HttpClient.newHttpClient();
var token = "pk_live_your_token_here";
var apiUrl = "https://your-domain.com";
// Upload file
var boundary = "----Boundary" + System.currentTimeMillis();
var body = "--" + boundary + "\r\n"
+ "Content-Disposition: form-data; name=\"file\"; filename=\"doc.pdf\"\r\n"
+ "Content-Type: application/pdf\r\n\r\n";
// ... append file bytes and closing boundary ...
var req = HttpRequest.newBuilder()
.uri(URI.create(apiUrl + "/api/v1/clean"))
.header("Authorization", "Bearer " + token)
.header("Content-Type", "multipart/form-data; boundary=" + boundary)
.POST(HttpRequest.BodyPublishers.ofByteArray(fullBody))
.build();
var resp = client.send(req, HttpResponse.BodyHandlers.ofString());
// Parse JSON response for task_id, poll status, download result
Go
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"mime/multipart"
"net/http"
"os"
"time"
)
func main() {
token := "pk_live_your_token_here"
apiURL := "https://your-domain.com"
// Upload
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
file, _ := os.Open("document.pdf")
part, _ := writer.CreateFormFile("file", "document.pdf")
io.Copy(part, file)
writer.Close()
req, _ := http.NewRequest("POST", apiURL+"/api/v1/clean", body)
req.Header.Set("Authorization", "Bearer "+token)
req.Header.Set("Content-Type", writer.FormDataContentType())
resp, _ := http.DefaultClient.Do(req)
var task map[string]interface{}
json.NewDecoder(resp.Body).Decode(&task)
fmt.Println("Task:", task["task_id"])
// Poll & download...
}
Error Codes
| HTTP Code | Error | Description |
|---|---|---|
| 400 | invalid_file | Only PDF files accepted |
| 400 | file_too_large | File exceeds size limit |
| 400 | invalid_url | Invalid URL provided |
| 400 | download_failed | Failed to download PDF from URL |
| 401 | missing_token | No API token provided |
| 401 | token_error | Invalid API token |
| 403 | token_error | Token disabled or expired |
| 404 | not_found | Task not found |
| 429 | token_error | Daily rate limit exceeded |
Rate Limits & Token Management
Daily Limits
Each token can have a daily request limit (0 = unlimited). When exceeded, requests return HTTP 429.
Token Expiration
Tokens can have an expiration date. Expired tokens return HTTP 403.
Origin Restrictions
Tokens can be restricted to specific domains for additional security.
Key Regeneration
If a token is compromised, regenerate its key from the admin panel. The old key is invalidated immediately.
MCP Server (AI Integration)
PDF Cleaner Pro provides a Model Context Protocol (MCP) server for AI assistants to call PDF cleaning tools directly.
Configuration
Add to your MCP client config (e.g., Claude Desktop claude_desktop_config.json):
{
"mcpServers": {
"pdf-cleaner": {
"command": "python",
"args": ["-m", "app.mcp_server"],
"env": {
"PDF_CLEANER_API_URL": "https://your-domain.com",
"PDF_CLEANER_API_TOKEN": "pk_live_your_token_here"
}
}
}
}
Available Tools
| Tool | Description |
|---|---|
| clean_pdf_file | Clean a local PDF file |
| clean_pdf_url | Clean a PDF from URL |
| check_task_status | Check processing status |
| download_cleaned_pdf | Download cleaned result |