Show HN: Inngest – Developer platform for background jobs and workflows

submitted by danfar+(OP) on 2023-06-20 12:24:08 | 87 points 25 comments
[view article] [source] [links] [go to bottom]

Hi HN! We’re Dan and Tony - founders of Inngest (https://www.inngest.com/). Inngest is a developer platform and toolchain for developing, testing and running background jobs, and workflows. Inngest invokes your jobs via HTTP, wherever you want to deploy your code.

Shipping reliable background jobs and workflows is a time suck for any software team. They’re painful to develop locally and getting into production is a tedious experience of configuring infra. When you want to add scheduling, orchestrate multi-step workflows or handle concurrency or idempotency, you spend even more time building bespoke systems - not your actual product.

Software engineers spend a ton of duplicated effort building and rebuilding this at every company. It shouldn’t be this way.

We’ve taken our experience building and scaling reliable, secure queueing systems across Healthcare, B2B SaaS, and developer infra companies. With Inngest, we sought out to create a single platform and set of developer tools to unburden the developer.

- You write functions alongside your API, in your existing codebase with our simple SDK. We invoke your functions via HTTPS, so there are no additional worker services to setup.

- End-to-end local development, with one command. Our dev server runs Inngest on any machine with a web interface to visualize, debug, and test your functions with zero additional dependencies.

- Our serverless queue calls you, so you can run your code anywhere - serverless, servers or edge.

- Inngest manages state across functions and long-running workflows for you. We handle retries, concurrency, idempotency, and coordinating parallel and sequential workloads out-of-the-box.

We’ve helped users like:

- Snaplet.dev uses Inngest to handle the lifecycle of managing preview databases for their developer platform.

- Ocoya.com re-build their e-commerce and social media scheduling workflows in days while dramatically simplifying their infra to run solely with Inngest + serverless functions.

- Secta.ai uses Inngest to run all of their AI image generation models on GPU-optimized instances.

Today, we have a TypeScript SDK and we will expand to other languages soon (Go is next). We’re building in the open on Github and we offer usage-based plans with a generous free tier.

We’re excited to share this with HN and we’re eager for your feedback! What are your experiences building systems for background jobs and workflows?

replies(10): >>thakob+R >>dimitr+o2 >>tomred+F4 >>machia+Ue >>zenoro+3i >>GGO+xA >>imsh4y+0X >>frant1+UY >>mirzap+U91 >>devty+uX3

>>danfar+(OP)
Building reliable background jobs and engineering workflows has almost always been challenging in any company I worked for, and I’m glad there is a company now who tries to excel the DX aspect of this problem.

replies(1): >>danfar+11

>>thakob+R
Thanks for the comment! what were some of the most painful parts of this at the companies that you worked for?

replies(1): >>mindvi+G2

>>danfar+(OP)
why only TypeScript?

I realize you can't please everyone at all times but I'd love to have a Rust or Zig SDK option. Go is a good start in that direction I guess..

replies(2): >>danfar+A4 >>darwin+pt1

>>danfar+11
Schema management!

replies(1): >>danfar+A5

>>dimitr+o2
We started with TypeScript because it's not well supported with a current solution and none of them support serverless. We wanted to solve serverless first as it's made supporting long-running servers easy.

A lot of folks in the TS/JS community also don't often build distributed systems and it's easy to get wrong. So we think they're hungry for something like Inngest that they don't need to manage or spend weeks learning some complex system. Plus, TS gives us typing for all events/messages.

We already have a working Go SDK that we use internally and we have a test harness that will enable us to add other languages like Rust or Zig more easily. We even have a community member building a PoC for Elixir.

>>danfar+(OP)
The timing of this is pretty awesome for me. I’m building a product that requires fairly heavy, scheduled background services. Originally, I had built these services intermingled with my client and API, but it was not ideal from a development or deployment process. Plus we had rolled our own monitoring which was itself a PITA to maintain. With Inngest, I moved all of our background services and processing to a separate sub-repo, and we can develop, deploy and monitor entirely independently from the rest of our product which has really sped things up. Love it. Would recommend for anything event-based!

The last straw for me was the few times I ran into issues, often due to my own mistakes, their support was nearly real-time and worked with me either help me solve the problem or dig in on their end to see where the issue was. Honestly more than anything the support gives me confidence to fully commit to this and use across all my production apps.

Anyway, great stuff all, you’ve built something awesome here.

replies(1): >>danfar+b6

>>mindvi+G2
This is a great one. I've experienced this myself, especially when you change an event/message and then you need to handle that change in your job/workflow. Things can break pretty easily so you need to have versioning for both.

This is why we've built event schema versioning and versioning for functions baked into the platform. We have big plans for the schema management side of things that bring concepts of data governance to engineering teams. It should just be for data teams. As a bonus, we can also generate language types from schemas easily then.

What else about schema management is a pain? What have you used for this?

replies(1): >>tianzh+IN

>>tomred+F4
> we had rolled our own monitoring which was itself a PITA to maintain

Thanks! What type of monitoring were you looking for? We have some basic metrics now, but know we need to improve this. What metrics, alerting, observability are important for you?

replies(1): >>distra+pa

>>danfar+b6
Not the original commenter but I manage a similar system:

1. Wait timings for jobs.

2. Run timings for jobs.

3. Timeout occurrences and stdout/stderr logs of those runs

4. Retry metrics, and if there is a retry limit, then metrics on jobs that were abandoned.

One thing that is easy to overlook is giving users the ability to define a specific “urgency” for their jobs which would allow for different alerting thresholds on things like running time or waiting.

replies(1): >>danfar+Hc

>>distra+pa
This is great - we do capture all logs for each run including any retries, so you can see errors and general successes. All of these other metrics we have internally, but need to expose to our users!

Observability is super key for background work even more so since it's not always tied to a specific user action, so you need to have a trail to understand issues.

> One thing that is easy to overlook is giving users the ability to define a specific “urgency” for their jobs which would allow for different alerting thresholds on things like running time or waiting.

We are adding prioritization for functions soon so this is helpful for thinking about how to think about telemetry for different priority/urgent jobs.

re: timeouts - managing timeouts usually means managing dead-letter queues and our goal is to remove the need to think about DLQs at all and build metrics and smarter retry/replay logic right into the Inngest platform.

replies(1): >>jtwebm+Bf

>>danfar+(OP)
Seems like similar API/usecase as Temporal. Do I get it right that your system is similar but easier to use as it's basically exposes similar functionality via HTTP and higher-level API?

Do I get it right that difference between this and for example ActiveJob in Rails is that you handle well multi step workflows where there's a need to coordinate and wait for some event/thing to finish (or just sleep). And benefit is that it it's easy to read whole flow as it's async function?

replies(1): >>danfar+ii

>>danfar+Hc
Sorry DLQs make it easier to do those alerts where a human needs to look asap at something. Not sure they can be gotten rid of, but maybe you call them something else.

replies(1): >>goodol+ak

>>danfar+(OP)
Background jobs is the kind of problem that every company has but there’s still room to find the best DX possible. I’m glad there’s people tackling this problem.

>>machia+Ue
Exactly - we have many users that have come over after using Temporal. We designed our SDK to be more lightweight and flexible. We want it to feel more just like writing normal code, not a new coding paradigm. For example, you can define steps right within your function, not as separate "activities."

Being HTTP based (push vs. pull), it's easier to manage and works natively with serverless and servers.

Inngest is also event-driven, so you can fan-out and do things like have your workflow wait for another event. Our `step.waitForEvent()` allows you to pause a function until another event is received, creating dynamic jobs that can wait for additional actions or input. Also, using events allows us to replay failures super easily.

re: ActiveJob - Yeah, multi-step workflows are a huge difference. We manage step retries and the function state for you. That makes things like sleep and coordinating between events easy. As you mentioned, it leads to simpler function definition so it means that almost any engineer can write workflows quickly and easily read the code in a single place, reducing bugs due to disconnected jobs.

>>jtwebm+Bf
Inngest engineer here!

Agreed that alerting is important! We alert on job failures, plus we integrate with observability tools like Sentry.

For DLQs, you're right that they have value. We aren't killing DLQs but rather rethinking them with better ergonomics. Instead of having a dumping ground for unacked messages, we're developing a "replay" feature that lets you retry failed jobs over a period of time. Our planned replay feature will run failures in a separate queue, which can be cancelled at any time. The replay itself can be retried as well if there's still a problem

>>danfar+(OP)
Can you elaborate on why you chose to go with SSPL license? I want to open source a project and have been thinking between SSPL and AGPL. I am held back by OSI stating that the SSPL is does not comply with its Open Source Definition because it discriminates against specific fields of endeavor, describing it as a "fauxpen" source license.

replies(1): >>danfar+uD

>>GGO+xA
Good question. This was a hard question for us last year and we chose SSPL for the time being as a early stage startup to offer some protection. AGPL allows anyone to deploy your system and re-sell it, but SSPL requires the person to open source their additions that they make for their platform, which benefits the project itself.

*Caveat*: This is super nuanced and hotly debated, so this is high level and no perfect answer here.

Mid term, we plan to move from SSPL to a more open license in the future as we further develop our open source project.

>>danfar+A5
Congrats on the launch. Building a reliable background job / workflow infra is hard. Temporal has lifted the bar significantly, glad to see new development.

As for the schema management part, we at bytebase.com have also built an OSS product to tackle this specifically.

>>danfar+(OP)
Having an event-driven infrastructure is a big missing piece in the serverless world and I'm glad to see you guys stepping in and filling this gap!

>>danfar+(OP)
We've been using Inngest at Secta.ai for the last ~6 months, happy to answer any questions!

DX is great! Writing the jobs feels very natural, much much simpler than Temporal. The development server is neat and makes debugging jobs very easy. TypeScript SDK is idiomatic, the types are properly inferred & propagated throughout the whole app.

The nice thing about writing step functions for Inngest vs regular "async worker queues" is that we can express logic, e.g. "if X than wait for event Y", with a layer of caching/retries on top.

replies(1): >>dimitr+Gf6

>>danfar+(OP)
Looks interesting and promising. Is it Open Source? Can it be self-hosted?

replies(1): >>tonyhb+uz1

>>dimitr+o2
no promises here but I'd love to look into Rust in the future. there're essentially no background job systems for Rust iirc.

>>mirzap+U91
The executor, queue, state, drivers, etc. are all on Github (https://github.com/inngest/inngest).

Over the last year we've been iterating on the internals a lot to build things like:

- Concurrency (shared nothing, auto-scalable)

- Batching (have one fn run with 100 events, vs 1:1 mapping)

- Prioritization

- Replay

- Parallelization

- Branch deploys

- Rate limiting

The changes have been heavy, and it would be really hard for self-hosted people to handle the migrations necessary for these. Now that this is slowing, self hosting is realistically something that's possible soon. We'd prefer to offer self hosting when it's easy and ready, vs something that's a burden to operate.

zlacker

Show HN: Inngest – Developer platform for background jobs and workflows