// Add advanced options like temperature OptionsBuilder optionsBuilder = new OptionsBuilder(); optionsBuilder.setTemperature(0.7); builder.withOptions(optionsBuilder.build());
import org.springframework.ai.ollama.OllamaChatModel; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; @RestController public class AiController private final OllamaChatModel chatModel; public AiController(OllamaChatModel chatModel) this.chatModel = chatModel; @GetMapping("/ai/generate") public String generate(@RequestParam(value = "message") String message) return chatModel.call(message); Use code with caution.
Flux<String> responseStream = chatModel.stream(new Prompt(history)) .flatMap(response -> Flux.fromIterable(response.getResults())) .map(result -> result.getOutput().getContent()); ollamac java work
wget https://ollama.com/download/ollama-linux-amd64.tgz tar -xzf ollama-linux-amd64.tgz sudo ./install.sh ollama serve --version
: Local inference takes time depending on your hardware (M1/M2/M3 chips process much faster). Extend the HTTP client read timeout settings in Java to prevent premature errors. A local model does not keep state between calls
A local model does not keep state between calls. To build a chatbot that remembers previous turns, you must maintain the conversation history yourself.
In the rapidly evolving landscape of artificial intelligence, a powerful new paradigm has emerged: running Large Language Models (LLMs) entirely on your own hardware. This approach offers compelling advantages over cloud-based AI services, including enhanced data privacy, predictable latency, zero API costs, and the ability to operate in air-gapped or offline environments. For the vast ecosystem of Java developers—the architects of enterprise systems, Android applications, and Big Data infrastructure—integrating these local AI capabilities is becoming an essential skill. including enhanced data privacy
This approach is straightforward and transparent. You handle the JSON parsing and network calls yourself, giving you complete control over the process.
Running models locally eliminates pay-per-token cloud billing, making it highly cost-effective for high-volume processing.
To verify that the server is running and the model is loaded, you can use curl to send a test request: