1. llama.cpp can run on both Android and iOS devices.
2. For non-transformer models, Mamba and RWKV are also good options.
3. Additionally, you should have a better understanding of embeddings, tokens, and the structure of transformers.
And on Apple Silicon chips, MLX is preferred and is often the best choice.
Iām trying to create something that will run online when connected, run local when offline, and dynamically download relevant files to prepare for offline use.
And on Apple Silicon chips, MLX is preferred and is often the best choice.