OpenLineage Proxy Server

A Next.js-based OpenLineage proxy server that receives and stores OpenLineage events as individual JSON files for debugging and analysis purposes.

Test the API

Send a test OpenLineage event to verify the proxy is working

POST /api/v1/lineage

Overview

This project serves as a debugging proxy for OpenLineage events emitted by various data integration and transformation tools:

๐Ÿ”ง

dbt

Data Build Tool

โšก

Apache Spark

Big Data Processing

๐ŸŒŠ

Apache Airflow

Workflow Management

Features

๐Ÿ“ก

REST API Endpoint

Receives OpenLineage events via HTTP POST requests

๐Ÿ’พ

File-based Storage

Saves each event as a separate JSON file with unique naming

๐Ÿ”’

Thread-safe Operations

Uses file locking for concurrent request handling

๐Ÿท๏ธ

Unique File Naming

Counter + UUID format for easy tracking

๐Ÿ“

Pretty-printed JSON

Formatted JSON output for easy reading and debugging

โš ๏ธ

Error Handling

Comprehensive error handling and logging

Installation

1Clone the repository

bash
git clone https://github.com/senthilsweb/open-lineage-proxy.git
cd open-lineage-proxy

2Install dependencies

bash
npm install

3Start the development server

bash
npm run dev

Configuration

Configure your data integration tools to send OpenLineage events to this proxy:

bash
OPENLINEAGE_URL=http://localhost:3000
OPENLINEAGE_NAMESPACE=dev

Usage Examples

๐Ÿ”ง dbt with OpenLineage

bash
# Install dbt with OpenLineage support
pip install dbt-openlineage

# Set environment variables
export OPENLINEAGE_URL=http://localhost:3000
export OPENLINEAGE_NAMESPACE=dev

# Run dbt with OpenLineage
dbt-ol run

โšก Apache Spark with OpenLineage

bash
spark-submit \
  --packages io.openlineage:openlineage-spark:0.21.1 \
  --conf spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener \
  --conf spark.openlineage.transport.type=http \
  --conf spark.openlineage.transport.url=http://localhost:3000 \
  --conf spark.openlineage.namespace=dev \
  your_spark_job.py

๐Ÿงช Manual Testing with cURL

bash
curl -X POST http://localhost:3000/api/v1/lineage \
  -H "Content-Type: application/json" \
  -d '{
    "eventType": "START",
    "eventTime": "2024-01-20T10:00:00.000Z",
    "run": {"runId": "12345678-1234-1234-1234-123456789012"},
    "job": {"namespace": "dev", "name": "test_job"},
    "inputs": [], "outputs": []
  }'

API Endpoints

POST/api/v1/lineage

Receives OpenLineage events and saves them as JSON files.

โœ… 200 OK: "Payload saved successfully"
โŒ 500 Error: "Error writing to file"

GET/api/status

Returns current status and statistics of the API server.

File Output

Each OpenLineage event is saved as a JSON file with the following naming convention:

{counter}_lineage_data_{uuid}.json

Example Filenames:

  • 001_lineage_data_a1b2c3d4-e5f6-7890-abcd-ef1234567890.json
  • 002_lineage_data_b2c3d4e5-f6g7-8901-bcde-f12345678901.json

Storage Location:

/pages/api/v1/lineage/