Why Deploy DeepSeek R1 Locally?#
- Data Privacy and Security
- Sensitive Data Protection: When handling sensitive data such as medical, financial, or governmental information, local deployment ensures that data does not leave the internal network, avoiding the risks of cloud transmission or third-party storage leaks.
- Compliance Requirements: Certain regulations (such as GDPR, HIPAA) mandate that data must be stored locally or in specific regions, and local deployment can directly meet compliance.
- Performance and Low Latency
- Real-time Requirements: Scenarios such as quality inspection in manufacturing and real-time decision-making require millisecond-level responses, and local servers reduce network latency.
- High Bandwidth Data Processing: For high-frequency trading or video analysis, local deployment avoids bandwidth bottlenecks caused by uploading to the cloud.
- Customization and System Integration
- Deep Adaptation to Business: Model parameters, interfaces, or output formats can be adjusted to fit unique enterprise processes (e.g., integration with internal ERP or BI tools).
- Privatized Function Development: Supports the addition of industry-specific modules (e.g., legal clause analysis, industrial fault diagnosis) while protecting intellectual property.
- Cost Control (Long-term)
- Economical for Scalable Use: If there is a high volume of long-term calls, the investment in local hardware may be lower than the ongoing subscription fees for cloud services.
- Reuse of Existing Infrastructure: When enterprises already have server/GPU resources, deployment costs are further reduced.
- Network and Stability
- Offline Environment Operation: In scenarios with unstable or no network, such as mines or ocean-going vessels, local deployment ensures service continuity.
- Avoiding Cloud Service Interruption Risks: Does not rely on the availability of third-party cloud vendors (e.g., occasional failures of AWS/Azure).
- Complete Autonomy and Control
- Self-determined Upgrades and Maintenance: Decide when to update model versions, avoiding business interruptions caused by forced upgrades in the cloud.
- Audit and Supervision: Full control over system logs and access records facilitates internal audits or responses to regulatory inspections.
What Configuration is Needed to Install DeepSeek R1?#
DeepSeek Model Windows Configuration Requirements:
Model Name | Parameter Count (Billion) | Model File Size | Unified Memory Requirement (Runtime) | Minimum Windows Configuration Requirements |
---|---|---|---|---|
deepseek-r1:1.5b | 15 | 1.1 GB | 2~3 GB | CPU: 4 cores + Memory: 8GB + Disk: 3GB +, supports pure CPU inference |
deepseek-r1:7b | 70 | 4.7 GB | 5~7 GB | CPU: 8 cores + Memory: 16GB + GPU: RTX 3070/4060 (8GB+ VRAM) |
deepseek-r1:8b | 80 | 4.9 GB | 6~8 GB | CPU: 8 cores + Memory: 16GB + GPU: RTX 3070/4060 (8GB+ VRAM) |
deepseek-r1:14b | 140 | 9 GB | 10~14 GB | CPU: 12 cores + Memory: 32GB + GPU: RTX 4090 (16GB+ VRAM) |
deepseek-r1:32b | 320 | 20 GB | 22~25 GB | CPU: i9/Ryzen9 + Memory: 64GB + GPU: A100 (24GB+ VRAM) |
deepseek-r1:70b | 700 | 43 GB | >45 GB | Server-level configuration: 32-core CPU/128GB memory/multiple cards in parallel (e.g., 4xRTX4090) |
DeepSeek Model Mac Configuration Requirements:
Model Name | Parameter Count (Billion) | Model File Size | Unified Memory Requirement (Runtime) | Minimum Mac Configuration Requirements |
---|---|---|---|---|
deepseek-r1:1.5b | 15 | 1.1 GB | 2~3 GB | MacBook Air (M2/M3 chip, ≥8GB memory) |
deepseek-r1:7b | 70 | 4.7 GB | 5~7 GB | MacBook Air or Mac mini (M2/M3/M4 chip, ≥16GB memory) |
deepseek-r1:8b | 80 | 4.9 GB | 6~8 GB | MacBook Air or Mac mini (M2/M3/M4 chip, ≥16GB memory) |
deepseek-r1:14b | 140 | 9 GB | 10~14 GB | MacBook Pro (M2/M3/M4 Pro chip, ≥32GB memory) |
deepseek-r1:32b | 320 | 20 GB | 22~25 GB | Mac Studio (M2 Max/Ultra) or MacBook Pro (M2/M3/M4 Max, ≥48GB memory) |
deepseek-r1:70b | 700 | 43 GB | >45 GB | Mac Studio (M2 Max/Ultra) or MacBook Pro (M2/M3/M4 Max, ≥64GB memory) |
How to Deploy DeepSeek R1 Locally?#
Note: I am using a Mac mini M4 model, and the deployment on Windows is similar to that on Mac.
-
Two tools need to be downloaded
- Ollama
- AnythingLLM
-
Installation Flowchart
1. Ollama#
-
Mainly used to install and run various large models locally, including DeepSeek.
- Ollama is a free open-source project designed as an open-source tool for conveniently deploying and running LLMs on local machines, allowing users to easily load, run, and interact with various LLM models without needing to understand complex underlying technologies.
- Features of Ollama:
- Local Deployment: Does not rely on cloud services, allowing users to run models on their own devices, protecting data privacy.
- Multi-Operating System Support: Can be easily installed and used on Mac, Linux, or Windows.
- Multi-Model Support: Ollama supports various popular LLM models, such as Llama, Falcon, etc., including the recently updated Meta's newly open-sourced large model llama3.1 405B, allowing users to choose different models based on their needs and run them with one click.
- Ease of Use: Provides an intuitive command-line interface that is simple to operate and easy to get started.
- Extensibility: Supports custom configurations, allowing users to optimize based on their hardware environment and model requirements.
- Open Source: The code is fully open, allowing users to view, modify, and distribute freely (though not many will modify it).
2. DeepSeek R1#
-
Find deepseek-r1 on the Ollama website and install it in the Mac terminal.
-
Installation
-
Go back to the Ollama website, select Models, and choose deepseek-r1.
-
Here, we default to selecting the model with 7b parameters, and we will use the recommended 7b parameter model.
https://ollama.com/library/deepseek-r1
-
Open the Mac terminal and copy this command line.
ollama run deepseek-r1:7b
-
If the download speed slows down or pauses, simply hold down the Control+c key and re-execute the command; you will be amazed to find that the download speed increases, as it supports resuming downloads.
-
-
If you see success at the bottom, it means the installation was successful.
-
Now we can freely input any questions we want to ask in this terminal window.
-
3. Embedding Models#
- Explanation
- Embedding models are techniques that convert high-dimensional data such as text and images into low-dimensional vectors, focusing on capturing semantic information for easier machine learning processing.
- Embedding models are the "translators" of AI, converting complex data into vectors that machines can understand, driving applications related to semantic understanding.
- Common Types and Features
Type | Model | Features |
---|---|---|
Word Embedding | e.g., Word2Vec, GloVe | Maps words to vectors, capturing semantic relationships (e.g., "king - man + woman ≈ queen") |
Contextual Embedding | e.g., BERT, GPT | Generates dynamic vectors based on context (e.g., "apple" has different meanings in "eating an apple" and "Apple phone") |
Sentence/Document Embedding | e.g., Sentence-BERT | Represents entire sentences or paragraphs as vectors for similarity calculations, clustering, etc. |
Multimodal Embedding | e.g., CLIP | Jointly processes text and images/audio, supporting cross-modal retrieval (e.g., searching for images using text). |
- Explanation
- Embedding models are techniques that convert high-dimensional data such as text and images into low-dimensional vectors, focusing on capturing semantic information for easier machine learning processing.
- Embedding models are the "translators" of AI, converting complex data into vectors that machines can understand, driving applications related to semantic understanding.
- Common Types and Features
- We will use the BGE-M3 model from the embedding models.
- Explanation of BGE-M3.
- Language Versatility
- Supports over 100 languages, allowing precise matching when searching for English materials in Chinese or Spanish news in Japanese.
- Dual Search Modes
- Understanding Meaning: For example, searching for "pets" can also find content related to "cats and dogs."
- Keyword Matching: For example, strictly searching for articles containing "AI" or "artificial intelligence" without missing results.
- Long Articles Without Fragmentation
- When reading long texts such as papers or contracts, it can remember the overall content without forgetting earlier parts like ordinary tools.
- Resource Efficient
- Has a compact version (e.g., "mini version") that can be used on phones or small websites without lag.
- Language Versatility
- Download bge-m3
-
Open the Mac terminal and enter
ollama pull bge-m3
-
If you see success, the installation was successful.
-
- Explanation of BGE-M3.
4. AnythingLLM#
-
Explanation
- AnythingLLM replaces the terminal window with a simple UI user window.
- AnythingLLM helps us build a personal local knowledge base.
- AnythingLLM supports multiple input methods, including text, images, and audio, and can segment and vectorize documents in formats such as PDF, TXT, DOCX, using RAG (Retrieval-Augmented Generation) technology to reference document content in conversations with LLMs.
Main Features:
- Multi-user Management and Permission Control: Makes team collaboration easier, allowing everyone to use LLM securely.
- AI Agent Support: Comes with a powerful AI Agent that can perform complex tasks such as web browsing and code execution, increasing automation.
- Embeddable Chat Window: Easily integrates into your website or application, providing users with an AI-driven conversational experience.
- Wide File Format Support: Supports various document types such as PDF, TXT, DOCX, meeting different scenario needs.
- Vector Database Management: Provides an easy-to-use interface for managing documents in the vector database, facilitating knowledge management.
- Flexible Conversation Modes: Supports both chat and query conversation modes, meeting different scenario needs.
- Information Source Tracking: Provides referenced document content during conversations, making it easier to trace information sources and enhancing result credibility.
- Multiple Deployment Options: Supports 100% cloud deployment and local deployment, meeting different user needs.
- Customizable LLM Models: Allows you to use your own LLM models, offering higher customization to meet personalized needs.
- Efficient Handling of Large Documents: Compared to other document chat solutions, AnythingLLM is more efficient and cost-effective in handling large documents, potentially saving up to 90% in costs.
- Developer Friendly: Provides a complete set of developer APIs for easy custom integration and greater extensibility.
-
Download, Install, Configure
- Download
- Find the official website: https://anythingllm.com/
- Installation
-
Click to start
-
Select Ollama
-
Click next
-
Skip the survey
-
Enter any workspace name, let's call it Little Fishing Assistant for now
-
Once you see Workspace created successfully, the installation is complete.
-
- Configuration
-
Click the 🔧 in the lower left corner, find Customization, Display Language, and select Chinese
-
Select Embedder Preferences
-
Embedding Engine Provider, select Ollama
-
Ollama Embedding Model, select the recently downloaded bge-3
-
Save changes
-
- Download
-
Workspace
- Purpose Explanation:
- Categorization
- Create different "rooms" for different tasks: for example, one room handles customer service Q&A, while another analyzes contract documents, avoiding interference and data mixing.
- Feeding Information to AI
- Upload documents, web pages, or notes to the workspace (like "preparing lessons" for AI), allowing it to learn your exclusive knowledge base.
- Trial and Error
- Directly ask questions in the workspace (e.g., simulating customer inquiries), watching AI responses in real-time and adjusting AI instructions as needed.
- Categorization
- Settings
- Click the ⚙️ in the workspace
- General Settings
- Here you can delete the workspace
- Chat Settings
- Set the chat mode to query (which will only provide answers based on the context of found documents)
- Chat Prompt
- Purpose Explanation:
-
Building a Personal Knowledge Base
-
Click the Little Fishing Assistant ⏫ button
-
Upload the prepared documents to the left knowledge base, then move them to the right Little Fishing Assistant and click save.
-