{"id":9012,"date":"2024-01-30T13:06:23","date_gmt":"2024-01-30T13:06:23","guid":{"rendered":"https:\/\/cheapwindowsvps.com\/blog\/exploring-lm-studio-running-local-ai-on-your-desktop-or-server\/"},"modified":"2025-01-16T11:21:51","modified_gmt":"2025-01-16T11:21:51","slug":"exploring-lm-studio-running-local-ai-on-your-desktop-or-server","status":"publish","type":"post","link":"https:\/\/cheapwindowsvps.com\/blog\/exploring-lm-studio-running-local-ai-on-your-desktop-or-server\/","title":{"rendered":"Exploring LM Studio: Running Local AI on Your Desktop or Server"},"content":{"rendered":"<p><div>LM Studio is a cost-free tool that enables you to operate an AI on your computer using open-source Large Language Models (LLMs) that are installed locally. It comes with a browser for searching and downloading LLMs from Hugging Face, a built-in Chat UI, and a local server runtime that is compatible with the <a href=\"https:\/\/4sysops.com\/archives\/install-the-python-openai-sdk-on-windows-and-macos\/\" rel=\"nofollow noopener\" target=\"_blank\">OpenAI API<\/a>. This server can be used to establish a development environment prior to deploying a more comprehensive LLM system, or to run your own version of ChatGPT without providing your business data to other entities.<\/div>\n<\/p>\n<p><h2>Setting up LM Studio<\/h2>\n<\/p>\n<p><p>If you want to run LM Studio, you need either a Mac with an M1 (or subsequent) CPU, or a Windows system with a CPU that offers support for AVX2 extensions. The requirements for both memory (RAM and hard drive space) are dependent on your selected LLM. Some are relatively small, coming in at just a few GB, whilst larger models might consume substantial storage and memory. While GPU acceleration is currently considered to be experimental, it&#8217;s not a necessity in order to operate most LLMs. The newest Mac and Windows installers can be downloaded from <a href=\"https:\/\/lmstudio.ai\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/lmstudio.ai\/<\/a>, and a version for Linux is currently being trialled.<\/p>\n<\/p>\n<p><h2>Procuring a model<\/h2>\n<\/p>\n<p><p>LM Studio has a built-in search function that uses the <a href=\"https:\/\/huggingface.co\/\" rel=\"nofollow noopener\" target=\"_blank\">Hugging Face AI community<\/a> API to search for models. For demonstration purposes, I will use Meta\u2019s Llama 2 model. Note that model availability and performance change daily. Enter the keywords <em>llama gguf<\/em> in the search bar. Because Hugging Face also contains models incompatible with LM Studio, I added the GGUF specification. GPT-Generated Unified Format (GGUF) is a file format for LLMs.<\/p>\n<\/p>\n<p><a href=\"https:\/\/4sysops.com\/wp-content\/uploads\/2024\/01\/LM&#039;s-model-search-interface.png\" rel=\"nofollow noopener\" target=\"_blank\"><\/a><\/p>\n<p><p>LM&#8217;s model search interface<\/p>\n<\/p>\n<p><h2>Searching LLMs in LM Studio<\/h2>\n<\/p>\n<p><p>The models you will see are filtered by potential compatibility with your system and ordered by popularity.<\/p>\n<\/p>\n<p><p>The most popular selection currently is <em>TheBloke\/Llama-2-7B-Chat-GGUF<\/em> as per Meta. Details about the model can be obtained from the Model Card which is a source of information from the developer of the model. This model is available with multiple quantization settings, a technique for streamlining a model by using fewer resources by downscaling the model&#8217;s precision.<\/p>\n<\/p>\n<p><p>In my situation, I will opt for Q5_K_M, which necessitates approximately 8GB of RAM and 5GB of disk space. Select the most suitable option based on your situation and resources. After selection, click <strong>Download<\/strong>.<\/p>\n<\/p>\n<p><a href=\"https:\/\/4sysops.com\/wp-content\/uploads\/2024\/01\/Inspect-the-Model-Card-button-to-find-out-more-about-the-model.png\" rel=\"nofollow noopener\" target=\"_blank\">Inspect the Model Card button to find out more about the model<\/a><\/p>\n<p><h2>How to Run an LLM on your desktop<\/h2>\n<\/p>\n<p><p>Click the <strong>AI Chat<\/strong> icon in the navigation panel on the left side. At the top, <strong>select a model to load<\/strong> and click the <strong>llama 2 chat<\/strong> option. It takes a few seconds to load.<\/p>\n<\/p>\n<p><p>LM Studio may ask whether to override the default LM Studio prompt with the prompt the developer suggests. A prompt suggests specific roles, intent, and limitations to the model, e.g., \u201cYou are a helpful coding AI assistant,\u201d which the model will then use as context for answering your prompts. Note that accepting a prompt (input) from a third party may have unintended side effects; you can review the prompt in the model settings.<\/p>\n<\/p>\n<p><p>On the right side, you also have various settings for your specific models, which can be used to fine-tune both performance and resource consumption. Experimental GPU support is available. You can now converse with the downloaded model.<\/p>\n<\/p>\n<p><a href=\"https:\/\/4sysops.com\/wp-content\/uploads\/2024\/01\/Chatting-with-Llama-2.png\" rel=\"nofollow noopener\" target=\"_blank\">Chatting with Llama 2<\/a><\/p>\n<p><h2>Running a local LLM server<\/h2>\n<\/p>\n<p><p>LM Studio also allows you to run a downloaded model as a server. This is useful for developing and testing your code. The server is compatible with the <a href=\"https:\/\/4sysops.com\/archives\/openai-api-example-building-a-simple-gpt-chatbot-with-the-chat-completions-api\/\" rel=\"nofollow noopener\" target=\"_blank\">OpenAI API<\/a>, currently the most popular API for LLMs.<\/p>\n<\/p>\n<p><p>Click on the Local Server option in the navigation bar. Note that starting a server disables the Chat option. Once again, you can change various features or use a preset, but the defaults should work fine. Click Start Server.<\/p>\n<\/p>\n<p><p>The LM Server runs on port 1234 by default, and you access the API via HTTP.<\/p>\n<\/p>\n<p><div><a href=\"https:\/\/4sysops.com\/wp-content\/uploads\/2024\/01\/Running-a-local-LLM-server.png\" rel=\"nofollow noopener\" target=\"_blank\">Running a local LLM server<\/a><\/div>\n<\/p>\n<p><p>If you <a href=\"https:\/\/4sysops.com\/archives\/install-the-python-openai-sdk-on-windows-and-macos\/\" rel=\"nofollow noopener\" target=\"_blank\">installed the OpenAI SDK for Python<\/a>, you can access the LM Server by simply replacing <em>client = OpenAI() <\/em>with <em>client = OpenAI(base_url=&#8221;http:\/\/localhost:1234\/v1&#8243;, api_key=&#8221;not-needed&#8221;)<\/em>. The best part is you don&#8217;t need an OpenAI API key.<\/p>\n<\/p>\n<p><div><a href=\"https:\/\/4sysops.com\/wp-content\/uploads\/2024\/01\/Accessing-the-OpenAI-API-of-LM-Studio.png\" rel=\"nofollow noopener\" target=\"_blank\">Accessing the OpenAI API of LM Studio<\/a><\/div>\n<\/p>\n<p><p>Instead of conversing with one of the OpenAI models, you will access a model installed on your desktop. In our case, this is Meta&#8217;s Llama 2.<\/p>\n<\/p>\n<p><div><a href=\"https:\/\/4sysops.com\/wp-content\/uploads\/2024\/01\/Monitoring-the-model-response-in-the-Server-Logs.png\" rel=\"nofollow noopener\" target=\"_blank\">Monitoring the model response in the Server Logs<\/a><\/div>\n<p>You can view the model&#8217;s response in the LM Studio Server Logs.<\/p>\n<div>\n<h2>Subscribe to 4sysops newsletter!<\/h2>\n<\/div>\n<h2>Conclusion<\/h2><\/p>\n","protected":false},"excerpt":{"rendered":"<p>LM Studio is a cost-free tool that enables you to operate an AI on your computer using open-source Large Language Models (LLMs) that are installed locally. It comes with a browser for searching and downloading LLMs from Hugging Face, a built-in Chat UI, and a local server runtime that is compatible with the OpenAI API. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":9013,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[153,92,160,161],"tags":[],"class_list":["post-9012","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-articles","category-llama-2","category-openai"],"_links":{"self":[{"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/posts\/9012","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/comments?post=9012"}],"version-history":[{"count":1,"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/posts\/9012\/revisions"}],"predecessor-version":[{"id":10282,"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/posts\/9012\/revisions\/10282"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/media\/9013"}],"wp:attachment":[{"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/media?parent=9012"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/categories?post=9012"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cheapwindowsvps.com\/blog\/wp-json\/wp\/v2\/tags?post=9012"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}