I have not settled on a model yet, could be donation-based or a limited free tier with optional premium access if demand justifies it, It all depends on how useful people find it here and whether it gains traction.
If you don't mind me asking (and feel free to not answer): what kind of running cost are we looking at here?

Wow! With 256GB of RAM, you should be able to process queries almost instantly, depending on your CPU strength. I tried running a Deepseek clone on xeon processors with only 32GB of RAM, and it took 17 mins to process a query. But you only have a small subset of what it was trained on, so I think you are probably running on a third party LLM service, would I be right?
Regarding indexes, I'm not sure how they work in AI queries, or even if they are needed when you can store the entire database in memory, but I think that's where you can find the best improvement potential. AI query analyzers must exists, run one on your dataset and it can tell you how you should be storing your data if not by simply one record per post.
Finally keep in mind that the big AI teams are releasing new models every week, which are followed quickly by many smaller companies providing these "one plan fits all" type AI offerings.