features

Batch Webhook Ingestion: NDJSON and CSV on the Webhook Source

May 12, 2026

A single inbound data stream splitting into many individual event packets flowing onward

If you have ever tried to push a batch of records into a webhook endpoint that only accepts one event per request, you know the workaround. You loop the array client-side and fire N HTTPS calls, one per record. A nightly export of 5000 rows turns into 5000 round trips, 5000 TLS handshakes, and a rate-limit problem nobody wanted.

The ProxyHook Webhook source now accepts batches directly. A single POST can carry up to 1000 events, encoded as either NDJSON or CSV, and ProxyHook splits the body into individual events on the way in. Every downstream filter, transformation, and destination sees one event at a time, exactly as if you had sent 1000 separate single-event requests.

What's new on the Webhook source

The Webhook source endpoint (the one that looks like https://go.proxyhook.com/A817GH after you create a Webhook source in the dashboard) now accepts three content types instead of one:

application/json: a single JSON object. The original behavior.
application/x-ndjson: one JSON object per line, separated by \n. Up to 1000 lines per request.
text/csv: a header row followed by data rows. Up to 1000 rows per request.

The format you pick is controlled by the Content-Type header on your POST. The body is whatever that format expects. Nothing else about the endpoint changes: the URL is the same, the response is still a 200 regardless of subscription status, and per-event delivery to destinations works the way it always has.

NDJSON: when the source already has typed data

NDJSON (newline-delimited JSON) is the right pick when whatever system is sending the batch already has typed values: numbers as numbers, booleans as booleans, nested objects intact. Each line is a complete JSON object. The endpoint splits on \n, skips empty lines, and treats every remaining line as one event.

curl -X POST https://go.proxyhook.com/WEBHOOK_ID \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @events.ndjson

Where events.ndjson looks like:

{"email":"[email protected]","plan":"pro","mrr":99,"trial":false}
{"email":"[email protected]","plan":"free","mrr":0,"trial":true}
{"email":"[email protected]","plan":"enterprise","mrr":1200,"trial":false}

Three lines, three events. Downstream, a Postgres destination sees mrr as an integer and trial as a boolean, because NDJSON preserves the JSON type system per line. A nested object on one line stays nested.

Two rules to remember. First, each line must be a complete, valid JSON object, with no line breaks inside. If you pretty-print one of your records across multiple lines, the parser will treat each line as its own event and reject the malformed ones. Second, anything past row 1000 is dropped silently. If you have 5000 records to push, send five batches of 1000, not one batch of 5000.

CSV: when the source is a spreadsheet or a SQL export

CSV is the right pick when the data is coming out of a system that speaks CSV natively: a SELECT ... INTO OUTFILE from MySQL, a Google Sheets export, a CRM that drops a daily report, a finance team that lives in Excel. Instead of teaching that system to emit JSON, you POST the CSV file as-is.

curl -X POST https://go.proxyhook.com/WEBHOOK_ID \
  -H "Content-Type: text/csv" \
  --data-binary @events.csv

Where events.csv looks like:

email,plan,mrr,trial
[email protected],pro,99,false
[email protected],free,0,true
[email protected],enterprise,1200,false
"[email protected]","pro, annual",1188,false

The header row becomes the field names on every event. Each data row becomes one event. The parser handles the standard CSV escaping rules: wrap values containing commas, newlines, or quotes in double quotes, and double-up any inner quotes ("").

The one trap with CSV is types. CSV has no native type system, so every field arrives as a string. 99 becomes "99". false becomes "false". If your destination cares about types (a Postgres column declared as integer, a Datadog metric, a numeric filter rule), you have two options: send NDJSON instead, or insert a custom transformation step in the Automation to cast the strings to the right types before the destination receives them. The docs call this out directly, and the choice usually comes down to what's easier for the system on the sending side.

What happens after the batch hits the endpoint

The key thing to understand is that batching is purely an ingestion-time concern. Once a 1000-row NDJSON request is parsed, ProxyHook fans it out into 1000 individual events, and every step after that operates on one event at a time. Concretely:

Filters run per row. If you have a filter on the Automation that drops events where plan == "free", the filter sees each of the 1000 events independently and drops the ones that match. The batch has no special status. Filters available include Payload Contents (key-value lookups), City, Country, IP, Host, User Agent, and Referer.
Transformations run per row. A custom transformation step that casts mrr from a string to an integer runs 1000 times, once per row, with full access to that row's payload.
Destinations receive per-row deliveries. A Postgres destination gets 1000 row-level inserts, not one bulk insert of an array. A Slack destination would post 1000 messages (which is rarely what you want; see below). A Datadog destination receives 1000 individual events.
Logs sample per row. Source logs and Automation logs sample at their configured rate across the expanded event stream, so a 1000-row batch with a 10% sample rate produces roughly 100 log entries, not 1.

This matters because it means the new batch formats slot into everything you already built. You don't need to redesign your Automations, change your filters, or update your destinations. The same routing that worked for one-event-per-request works for batch ingestion, just with a higher event count per inbound HTTPS call.

A realistic use case: nightly CRM export into Postgres

Suppose your CRM does not have a real-time webhook for contact updates, but it does drop a CSV of changed contacts every night at 2am to an SFTP server. You want those changes in Postgres so your analytics queries are current by 8am.

Before batching, you would write a small worker that downloads the CSV, parses it, loops the rows, and POSTs one JSON event per contact to a Webhook source. For a 20,000-row CSV, that's 20,000 POSTs.

With CSV ingestion, the worker becomes a one-liner: split the CSV into 1000-row chunks and POST each chunk as text/csv to the same Webhook source. Twenty POSTs total. On the ProxyHook side, the Automation routes each row through a transformation (to cast numeric fields out of strings), then into a Postgres destination configured to upsert on contact_id. The destination sees 20,000 individual upserts, one per row, just as it would have before. Nothing about the Automation changes. Only the ingestion side gets simpler, and the request volume drops by roughly 1000x.

When not to use batch ingestion

Batches are great for bulk loads and offline exports. They are the wrong tool for two cases.

Real-time events. If your source is already emitting one event per occurrence (a Stripe payment, a Shopify order, a form submission), keep sending them as single-event JSON. Batching adds latency for no reason, because you would be waiting to accumulate rows before sending.

Fan-out to human destinations. A 1000-row batch routed to Slack creates 1000 messages. That's almost never what you want. If you're sending to a destination that a human reads (Slack, Discord, Microsoft Teams), add a filter on the Automation to drop everything except the rows worth notifying on, or skip those destinations for batch sources entirely.

Try it

Create a Webhook source from the dashboard if you don't have one already, then curl a small NDJSON or CSV body at the endpoint. The Webhook source docs have the full examples and the exact escaping rules for CSV. Existing single-event JSON posts keep working unchanged, so there's nothing to migrate.

← Back to blog