You can even not have any data layer all together. The only thing missing from a local LLM is knowledge of current medications by name if you want to just say whatever prescription you're following.
For sure, context rot is a problem, but that's also the easiest thing to control for in this case. If sensor data is relevant to you, having some code to process and reduce it to a dashboard you can read is always a good idea, independently of getting an LLM into the loop.
This becomes more complicated with data you can't really understand like results from blood tests, for example. But maybe you just don't summarize any of that.
I believe right now it's also valid to ditch NVIDIA given a certain budget. Let's see what can be done with large unified memory and maybe things will be different by the end of the year.
For some weird reason, in my country it's easier to order a Beelink or a Framework than an HP. They will sell everything else, except what you want to buy.
That's a good point, but it seems that there are several ways to make models fit in smaller memory hardware. But there aren't many options to compensate for not having the ML data types that allows NVIDIA to be like 8x faster sometimes.
For image generation, you don't need that much memory. That's the trade-off, I believe. Get NVIDIA with 16GB VRAM to run Flux and have something like 96GB of RAM for GPT OSS 120b. Or you give up on fast image generation and just do AMD Max+ 395 like you said or Apple Silicon.
I'm aware of it, seems cool. But I don't think AMD fully supports the ML data types that can be used in diffusion and therefore it's slower than NVIDIA.
You can even not have any data layer all together. The only thing missing from a local LLM is knowledge of current medications by name if you want to just say whatever prescription you're following.