API Integration Guide

Integrate PDF cleaning into your application with our REST API

Quick Start

1. Get your API token from the Admin Panel → Open API tab

2. Send a POST request with your PDF file:

curl -X POST https://your-domain.com/api/v1/clean \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -F "file=@document.pdf"

3. Poll the task status until completed:

curl https://your-domain.com/api/v1/task/{task_id} \
  -H "Authorization: Bearer YOUR_API_TOKEN"

4. Download the cleaned PDF:

curl -o cleaned.pdf https://your-domain.com/api/v1/task/{task_id}/download \
  -H "Authorization: Bearer YOUR_API_TOKEN"

Authentication

All API requests require a valid API token. Include it in one of these headers:

Method Header Example
Bearer TokenAuthorizationBearer pk_live_abc123...
API KeyX-API-Keypk_live_abc123...

API Endpoints

POST /api/v1/clean — Upload a PDF file for cleaning

Content-Type: multipart/form-data

Parameter Type Default Description
fileFilerequiredPDF file to process
remove_passwordbooltrueRemove password protection
remove_watermarkbooltrueRemove watermarks
remove_ad_textbooltrueRemove advertising text
remove_ad_imagesbooltrueRemove advertising images
remove_header_footerbooltrueRemove ad headers/footers
remove_backgroundbooltrueRemove background watermarks
remove_first_last_ad_pagesbooltrueRemove first/last ad pages
use_ocrbooltrueEnable OCR recognition
use_ai_detectionboolfalseEnable AI-powered detection
process_modestringgenerate_newgenerate_new or edit_original
extra_ad_keywordsstringnullComma-separated extra ad keywords (appended to global config)
extra_watermark_patternsstringnullComma-separated extra watermark patterns
custom_rulesstringnullComma-separated rule names to apply (empty = all active)
callback_urlstringnullWebhook URL for completion notification
POST /api/v1/clean-url — Clean a PDF from URL

Same parameters as /clean but use url field instead of file.

curl -X POST https://your-domain.com/api/v1/clean-url \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "url=https://example.com/document.pdf"
GET /api/v1/task/{task_id} — Get task status

Response includes download_url when status is completed.

{
  "task_id": "abc123...",
  "status": "completed",  // pending | analyzing | processing | completed | failed
  "progress": 100,
  "file_name": "document.pdf",
  "file_size": 1024000,
  "download_url": "https://your-domain.com/api/v1/task/abc123/download",
  "error_message": null
}
GET /api/v1/task/{task_id}/download — Download cleaned PDF

Returns the cleaned PDF file as binary download.

Code Examples

Python

import requests, time

API_URL = "https://your-domain.com"
TOKEN = "pk_live_your_token_here"
HEADERS = {"Authorization": f"Bearer {TOKEN}"}

# Upload and clean a PDF file
with open("document.pdf", "rb") as f:
    resp = requests.post(f"{API_URL}/api/v1/clean", headers=HEADERS, files={"file": f})
task = resp.json()
print(f"Task created: {task['task_id']}")

# Poll until complete
while task["status"] not in ("completed", "failed"):
    time.sleep(2)
    task = requests.get(f"{API_URL}/api/v1/task/{task['task_id']}", headers=HEADERS).json()
    print(f"  Status: {task['status']} Progress: {task['progress']}%")

# Download result
if task["status"] == "completed":
    pdf = requests.get(task["download_url"], headers=HEADERS)
    with open("cleaned.pdf", "wb") as f:
        f.write(pdf.content)
    print("Cleaned PDF saved!")

# --- Or clean from URL ---
resp = requests.post(f"{API_URL}/api/v1/clean-url", headers=HEADERS,
                     data={"url": "https://example.com/document.pdf"})

JavaScript (Node.js)

const fs = require('fs');
const FormData = require('form-data');
const axios = require('axios');

const API_URL = 'https://your-domain.com';
const TOKEN = 'pk_live_your_token_here';
const headers = { 'Authorization': `Bearer ${TOKEN}` };

async function cleanPDF(filePath) {
  // Upload
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));
  const { data: task } = await axios.post(`${API_URL}/api/v1/clean`, form,
    { headers: { ...headers, ...form.getHeaders() } });
  console.log('Task:', task.task_id);

  // Poll
  let status = task;
  while (!['completed', 'failed'].includes(status.status)) {
    await new Promise(r => setTimeout(r, 2000));
    status = (await axios.get(`${API_URL}/api/v1/task/${task.task_id}`, { headers })).data;
  }

  // Download
  if (status.status === 'completed') {
    const pdf = await axios.get(status.download_url, { headers, responseType: 'arraybuffer' });
    fs.writeFileSync('cleaned.pdf', pdf.data);
  }
}
cleanPDF('document.pdf');

cURL

# Upload file
curl -X POST https://your-domain.com/api/v1/clean \
  -H "Authorization: Bearer pk_live_your_token" \
  -F "file=@document.pdf" \
  -F "remove_watermark=true" \
  -F "extra_ad_keywords=sponsor,advertisement"

# Check status
curl https://your-domain.com/api/v1/task/TASK_ID \
  -H "Authorization: Bearer pk_live_your_token"

# Download result
curl -o cleaned.pdf https://your-domain.com/api/v1/task/TASK_ID/download \
  -H "Authorization: Bearer pk_live_your_token"

# Clean from URL
curl -X POST https://your-domain.com/api/v1/clean-url \
  -H "Authorization: Bearer pk_live_your_token" \
  -F "url=https://example.com/document.pdf"

PHP

<?php
$apiUrl = 'https://your-domain.com';
$token = 'pk_live_your_token_here';

// Upload
$ch = curl_init("$apiUrl/api/v1/clean");
curl_setopt_array($ch, [
    CURLOPT_POST => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HTTPHEADER => ["Authorization: Bearer $token"],
    CURLOPT_POSTFIELDS => ['file' => new CURLFile('document.pdf')],
]);
$task = json_decode(curl_exec($ch), true);
curl_close($ch);

// Poll until done
do {
    sleep(2);
    $ch = curl_init("$apiUrl/api/v1/task/{$task['task_id']}");
    curl_setopt_array($ch, [CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HTTPHEADER => ["Authorization: Bearer $token"]]);
    $task = json_decode(curl_exec($ch), true);
    curl_close($ch);
} while (!in_array($task['status'], ['completed', 'failed']));

// Download
if ($task['status'] === 'completed') {
    file_put_contents('cleaned.pdf',
        file_get_contents($task['download_url'], false,
            stream_context_create(['http' => ['header' => "Authorization: Bearer $token"]])));
}

Java

import java.net.http.*;
import java.nio.file.*;

var client = HttpClient.newHttpClient();
var token = "pk_live_your_token_here";
var apiUrl = "https://your-domain.com";

// Upload file
var boundary = "----Boundary" + System.currentTimeMillis();
var body = "--" + boundary + "\r\n"
    + "Content-Disposition: form-data; name=\"file\"; filename=\"doc.pdf\"\r\n"
    + "Content-Type: application/pdf\r\n\r\n";
// ... append file bytes and closing boundary ...

var req = HttpRequest.newBuilder()
    .uri(URI.create(apiUrl + "/api/v1/clean"))
    .header("Authorization", "Bearer " + token)
    .header("Content-Type", "multipart/form-data; boundary=" + boundary)
    .POST(HttpRequest.BodyPublishers.ofByteArray(fullBody))
    .build();
var resp = client.send(req, HttpResponse.BodyHandlers.ofString());
// Parse JSON response for task_id, poll status, download result

Go

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "mime/multipart"
    "net/http"
    "os"
    "time"
)

func main() {
    token := "pk_live_your_token_here"
    apiURL := "https://your-domain.com"

    // Upload
    body := &bytes.Buffer{}
    writer := multipart.NewWriter(body)
    file, _ := os.Open("document.pdf")
    part, _ := writer.CreateFormFile("file", "document.pdf")
    io.Copy(part, file)
    writer.Close()

    req, _ := http.NewRequest("POST", apiURL+"/api/v1/clean", body)
    req.Header.Set("Authorization", "Bearer "+token)
    req.Header.Set("Content-Type", writer.FormDataContentType())

    resp, _ := http.DefaultClient.Do(req)
    var task map[string]interface{}
    json.NewDecoder(resp.Body).Decode(&task)
    fmt.Println("Task:", task["task_id"])

    // Poll & download...
}

Error Codes

HTTP Code Error Description
400invalid_fileOnly PDF files accepted
400file_too_largeFile exceeds size limit
400invalid_urlInvalid URL provided
400download_failedFailed to download PDF from URL
401missing_tokenNo API token provided
401token_errorInvalid API token
403token_errorToken disabled or expired
404not_foundTask not found
429token_errorDaily rate limit exceeded

Rate Limits & Token Management

Daily Limits

Each token can have a daily request limit (0 = unlimited). When exceeded, requests return HTTP 429.

Token Expiration

Tokens can have an expiration date. Expired tokens return HTTP 403.

Origin Restrictions

Tokens can be restricted to specific domains for additional security.

Key Regeneration

If a token is compromised, regenerate its key from the admin panel. The old key is invalidated immediately.

MCP Server (AI Integration)

PDF Cleaner Pro provides a Model Context Protocol (MCP) server for AI assistants to call PDF cleaning tools directly.

Configuration

Add to your MCP client config (e.g., Claude Desktop claude_desktop_config.json):

{
  "mcpServers": {
    "pdf-cleaner": {
      "command": "python",
      "args": ["-m", "app.mcp_server"],
      "env": {
        "PDF_CLEANER_API_URL": "https://your-domain.com",
        "PDF_CLEANER_API_TOKEN": "pk_live_your_token_here"
      }
    }
  }
}

Available Tools

Tool Description
clean_pdf_fileClean a local PDF file
clean_pdf_urlClean a PDF from URL
check_task_statusCheck processing status
download_cleaned_pdfDownload cleaned result