CURSOR shifted to a more agentic approach even for chat requests to reduce input tokens.
Previously, they used the good old RAG pattern with code dumps: Request with user added files -> Retrieval (when Codebase enabled) -> LLM requests with combined context from user and retrieval.
Now they seem to be doing something like this: Request -> LLM with tools to search code base and/or user-added files
I get constant search tool calls even for user-added files. Big reduction in input token but I think performance suffers as well.
WINDSURF is still willing to dump code into the context, which gives them an edge in some cases (presumably at a cost of input tokens).
Windsurf is willing to spent to acquire customers (lower subscription cost, higher expenses for llm calls). Cursor has a huge customer base and is working on making it sustainable by a) reducing costs (see above) and b) increasing revenue (e.g. "Pro" requests for 0.05 with more input and output token).