【Article】DeepSeek R1 - Build a Personal Knowledge Base Locally

YouTube

Why Deploy DeepSeek R1 Locally?#

Data Privacy and Security

Sensitive Data Protection: When handling sensitive data such as medical, financial, or governmental information, local deployment ensures that data does not leave the internal network, avoiding the risks of cloud transmission or third-party storage leaks.
Compliance Requirements: Certain regulations (such as GDPR, HIPAA) mandate that data must be stored locally or in specific regions, and local deployment can directly meet compliance.

Performance and Low Latency

Real-time Requirements: Scenarios such as quality inspection in manufacturing and real-time decision-making require millisecond-level responses, and local servers reduce network latency.
High Bandwidth Data Processing: For high-frequency trading or video analysis, local deployment avoids bandwidth bottlenecks caused by uploading to the cloud.

Customization and System Integration

Deep Adaptation to Business: Model parameters, interfaces, or output formats can be adjusted to fit unique enterprise processes (e.g., integration with internal ERP or BI tools).
Privatized Function Development: Supports the addition of industry-specific modules (e.g., legal clause analysis, industrial fault diagnosis) while protecting intellectual property.

Cost Control (Long-term)

Economical for Scalable Use: If there is a high volume of long-term calls, the investment in local hardware may be lower than the ongoing subscription fees for cloud services.
Reuse of Existing Infrastructure: When enterprises already have server/GPU resources, deployment costs are further reduced.

Network and Stability

Offline Environment Operation: In scenarios with unstable or no network, such as mines or ocean-going vessels, local deployment ensures service continuity.
Avoiding Cloud Service Interruption Risks: Does not rely on the availability of third-party cloud vendors (e.g., occasional failures of AWS/Azure).

Complete Autonomy and Control

Self-determined Upgrades and Maintenance: Decide when to update model versions, avoiding business interruptions caused by forced upgrades in the cloud.
Audit and Supervision: Full control over system logs and access records facilitates internal audits or responses to regulatory inspections.

What Configuration is Needed to Install DeepSeek R1?#

DeepSeek Model Windows Configuration Requirements:

Model Name	Parameter Count (Billion)	Model File Size	Unified Memory Requirement (Runtime)	Minimum Windows Configuration Requirements
deepseek-r1:1.5b	15	1.1 GB	2～3 GB	CPU: 4 cores + Memory: 8GB + Disk: 3GB +, supports pure CPU inference
deepseek-r1:7b	70	4.7 GB	5～7 GB	CPU: 8 cores + Memory: 16GB + GPU: RTX 3070/4060 (8GB+ VRAM)
deepseek-r1:8b	80	4.9 GB	6～8 GB	CPU: 8 cores + Memory: 16GB + GPU: RTX 3070/4060 (8GB+ VRAM)
deepseek-r1:14b	140	9 GB	10～14 GB	CPU: 12 cores + Memory: 32GB + GPU: RTX 4090 (16GB+ VRAM)
deepseek-r1:32b	320	20 GB	22～25 GB	CPU: i9/Ryzen9 + Memory: 64GB + GPU: A100 (24GB+ VRAM)
deepseek-r1:70b	700	43 GB	>45 GB	Server-level configuration: 32-core CPU/128GB memory/multiple cards in parallel (e.g., 4xRTX4090)

DeepSeek Model Mac Configuration Requirements:

Model Name	Parameter Count (Billion)	Model File Size	Unified Memory Requirement (Runtime)	Minimum Mac Configuration Requirements
deepseek-r1:1.5b	15	1.1 GB	2～3 GB	MacBook Air (M2/M3 chip, ≥8GB memory)
deepseek-r1:7b	70	4.7 GB	5～7 GB	MacBook Air or Mac mini (M2/M3/M4 chip, ≥16GB memory)
deepseek-r1:8b	80	4.9 GB	6～8 GB	MacBook Air or Mac mini (M2/M3/M4 chip, ≥16GB memory)
deepseek-r1:14b	140	9 GB	10～14 GB	MacBook Pro (M2/M3/M4 Pro chip, ≥32GB memory)
deepseek-r1:32b	320	20 GB	22～25 GB	Mac Studio (M2 Max/Ultra) or MacBook Pro (M2/M3/M4 Max, ≥48GB memory)
deepseek-r1:70b	700	43 GB	>45 GB	Mac Studio (M2 Max/Ultra) or MacBook Pro (M2/M3/M4 Max, ≥64GB memory)

How to Deploy DeepSeek R1 Locally?#

Note: I am using a Mac mini M4 model, and the deployment on Windows is similar to that on Mac.

Two tools need to be downloaded
1. Ollama
2. AnythingLLM
Installation Flowchart

1. Ollama#

Mainly used to install and run various large models locally, including DeepSeek.

Ollama
- Ollama is a free open-source project designed as an open-source tool for conveniently deploying and running LLMs on local machines, allowing users to easily load, run, and interact with various LLM models without needing to understand complex underlying technologies.
- Features of Ollama:
  - Local Deployment: Does not rely on cloud services, allowing users to run models on their own devices, protecting data privacy.
  - Multi-Operating System Support: Can be easily installed and used on Mac, Linux, or Windows.
  - Multi-Model Support: Ollama supports various popular LLM models, such as Llama, Falcon, etc., including the recently updated Meta's newly open-sourced large model llama3.1 405B, allowing users to choose different models based on their needs and run them with one click.
  - Ease of Use: Provides an intuitive command-line interface that is simple to operate and easy to get started.
  - Extensibility: Supports custom configurations, allowing users to optimize based on their hardware environment and model requirements.
  - Open Source: The code is fully open, allowing users to view, modify, and distribute freely (though not many will modify it).

2. DeepSeek R1#

Find deepseek-r1 on the Ollama website and install it in the Mac terminal.

Ollama
Installation

Install deepseek-r1 in Ollama
1. Go back to the Ollama website, select Models, and choose deepseek-r1.
2. Here, we default to selecting the model with 7b parameters, and we will use the recommended 7b parameter model.
  
  https://ollama.com/library/deepseek-r1
3. Open the Mac terminal and copy this command line.
```
ollama run deepseek-r1:7b
```
  - If the download speed slows down or pauses, simply hold down the Control+c key and re-execute the command; you will be amazed to find that the download speed increases, as it supports resuming downloads.
4. If you see success at the bottom, it means the installation was successful.
5. Now we can freely input any questions we want to ask in this terminal window.

3. Embedding Models#

Explanation
- Embedding models are techniques that convert high-dimensional data such as text and images into low-dimensional vectors, focusing on capturing semantic information for easier machine learning processing.
- Embedding models are the "translators" of AI, converting complex data into vectors that machines can understand, driving applications related to semantic understanding.
- Common Types and Features

Type	Model	Features
Word Embedding	e.g., Word2Vec, GloVe	Maps words to vectors, capturing semantic relationships (e.g., "king - man + woman ≈ queen")
Contextual Embedding	e.g., BERT, GPT	Generates dynamic vectors based on context (e.g., "apple" has different meanings in "eating an apple" and "Apple phone")
Sentence/Document Embedding	e.g., Sentence-BERT	Represents entire sentences or paragraphs as vectors for similarity calculations, clustering, etc.
Multimodal Embedding	e.g., CLIP	Jointly processes text and images/audio, supporting cross-modal retrieval (e.g., searching for images using text).

Explanation
- Embedding models are techniques that convert high-dimensional data such as text and images into low-dimensional vectors, focusing on capturing semantic information for easier machine learning processing.
- Embedding models are the "translators" of AI, converting complex data into vectors that machines can understand, driving applications related to semantic understanding.
- Common Types and Features
We will use the BGE-M3 model from the embedding models.
- Explanation of BGE-M3.
  - Language Versatility
    - Supports over 100 languages, allowing precise matching when searching for English materials in Chinese or Spanish news in Japanese.
  - Dual Search Modes
    - Understanding Meaning: For example, searching for "pets" can also find content related to "cats and dogs."
    - Keyword Matching: For example, strictly searching for articles containing "AI" or "artificial intelligence" without missing results.
  - Long Articles Without Fragmentation
    - When reading long texts such as papers or contracts, it can remember the overall content without forgetting earlier parts like ordinary tools.
  - Resource Efficient
    - Has a compact version (e.g., "mini version") that can be used on phones or small websites without lag.
- Download bge-m3
  - Open the Mac terminal and enter
    ollama pull bge-m3
  - If you see success, the installation was successful.
    
    http://127.0.0.1:11434

4. AnythingLLM#

Explanation
- AnythingLLM replaces the terminal window with a simple UI user window.
- AnythingLLM helps us build a personal local knowledge base.
- AnythingLLM supports multiple input methods, including text, images, and audio, and can segment and vectorize documents in formats such as PDF, TXT, DOCX, using RAG (Retrieval-Augmented Generation) technology to reference document content in conversations with LLMs.
Main Features:
- Multi-user Management and Permission Control: Makes team collaboration easier, allowing everyone to use LLM securely.
- AI Agent Support: Comes with a powerful AI Agent that can perform complex tasks such as web browsing and code execution, increasing automation.
- Embeddable Chat Window: Easily integrates into your website or application, providing users with an AI-driven conversational experience.
- Wide File Format Support: Supports various document types such as PDF, TXT, DOCX, meeting different scenario needs.
- Vector Database Management: Provides an easy-to-use interface for managing documents in the vector database, facilitating knowledge management.
- Flexible Conversation Modes: Supports both chat and query conversation modes, meeting different scenario needs.
- Information Source Tracking: Provides referenced document content during conversations, making it easier to trace information sources and enhancing result credibility.
- Multiple Deployment Options: Supports 100% cloud deployment and local deployment, meeting different user needs.
- Customizable LLM Models: Allows you to use your own LLM models, offering higher customization to meet personalized needs.
- Efficient Handling of Large Documents: Compared to other document chat solutions, AnythingLLM is more efficient and cost-effective in handling large documents, potentially saving up to 90% in costs.
- Developer Friendly: Provides a complete set of developer APIs for easy custom integration and greater extensibility.
Download, Install, Configure
- Download
  - Find the official website: https://anythingllm.com/
- Installation
  - Click to start
  - Select Ollama
  - Click next
  - Skip the survey
  - Enter any workspace name, let's call it Little Fishing Assistant for now
  - Once you see Workspace created successfully, the installation is complete.
- Configuration
  - Click the 🔧 in the lower left corner, find Customization, Display Language, and select Chinese
  - Select Embedder Preferences
  - Embedding Engine Provider, select Ollama
  - Ollama Embedding Model, select the recently downloaded bge-3
  - Save changes
Workspace
- Purpose Explanation:
  - Categorization
    - Create different "rooms" for different tasks: for example, one room handles customer service Q&A, while another analyzes contract documents, avoiding interference and data mixing.
  - Feeding Information to AI
    - Upload documents, web pages, or notes to the workspace (like "preparing lessons" for AI), allowing it to learn your exclusive knowledge base.
  - Trial and Error
    - Directly ask questions in the workspace (e.g., simulating customer inquiries), watching AI responses in real-time and adjusting AI instructions as needed.
- Settings
  - Click the ⚙️ in the workspace
  - General Settings
    - Here you can delete the workspace
  - Chat Settings
    - Set the chat mode to query (which will only provide answers based on the context of found documents)
    - Chat Prompt
Building a Personal Knowledge Base
- Click the Little Fishing Assistant ⏫ button
- Upload the prepared documents to the left knowledge base, then move them to the right Little Fishing Assistant and click save.