小渔分享

小渔分享

youtube
bilibili
zhihu

【Article】DeepSeek R1 - Build a Personal Knowledge Base Locally

YouTube

Why Deploy DeepSeek R1 Locally?#

  1. Data Privacy and Security

  • Sensitive Data Protection: When handling sensitive data such as medical, financial, or governmental information, local deployment ensures that data does not leave the internal network, avoiding the risks of cloud transmission or third-party storage leaks.
  • Compliance Requirements: Certain regulations (such as GDPR, HIPAA) mandate that data must be stored locally or in specific regions, and local deployment can directly meet compliance.
  1. Performance and Low Latency

  • Real-time Requirements: Scenarios such as quality inspection in manufacturing and real-time decision-making require millisecond-level responses, and local servers reduce network latency.
  • High Bandwidth Data Processing: For high-frequency trading or video analysis, local deployment avoids bandwidth bottlenecks caused by uploading to the cloud.
  1. Customization and System Integration

  • Deep Adaptation to Business: Model parameters, interfaces, or output formats can be adjusted to fit unique enterprise processes (e.g., integration with internal ERP or BI tools).
  • Privatized Function Development: Supports the addition of industry-specific modules (e.g., legal clause analysis, industrial fault diagnosis) while protecting intellectual property.
  1. Cost Control (Long-term)

  • Economical for Scalable Use: If there is a high volume of long-term calls, the investment in local hardware may be lower than the ongoing subscription fees for cloud services.
  • Reuse of Existing Infrastructure: When enterprises already have server/GPU resources, deployment costs are further reduced.
  1. Network and Stability

  • Offline Environment Operation: In scenarios with unstable or no network, such as mines or ocean-going vessels, local deployment ensures service continuity.
  • Avoiding Cloud Service Interruption Risks: Does not rely on the availability of third-party cloud vendors (e.g., occasional failures of AWS/Azure).
  1. Complete Autonomy and Control

  • Self-determined Upgrades and Maintenance: Decide when to update model versions, avoiding business interruptions caused by forced upgrades in the cloud.
  • Audit and Supervision: Full control over system logs and access records facilitates internal audits or responses to regulatory inspections.

What Configuration is Needed to Install DeepSeek R1?#

DeepSeek Model Windows Configuration Requirements:

Model NameParameter Count (Billion)Model File SizeUnified Memory Requirement (Runtime)Minimum Windows Configuration Requirements
deepseek-r1:1.5b151.1 GB2~3 GBCPU: 4 cores + Memory: 8GB + Disk: 3GB +, supports pure CPU inference
deepseek-r1:7b704.7 GB5~7 GBCPU: 8 cores + Memory: 16GB + GPU: RTX 3070/4060 (8GB+ VRAM)
deepseek-r1:8b804.9 GB6~8 GBCPU: 8 cores + Memory: 16GB + GPU: RTX 3070/4060 (8GB+ VRAM)
deepseek-r1:14b1409 GB10~14 GBCPU: 12 cores + Memory: 32GB + GPU: RTX 4090 (16GB+ VRAM)
deepseek-r1:32b32020 GB22~25 GBCPU: i9/Ryzen9 + Memory: 64GB + GPU: A100 (24GB+ VRAM)
deepseek-r1:70b70043 GB>45 GBServer-level configuration: 32-core CPU/128GB memory/multiple cards in parallel (e.g., 4xRTX4090)

DeepSeek Model Mac Configuration Requirements:

Model NameParameter Count (Billion)Model File SizeUnified Memory Requirement (Runtime)Minimum Mac Configuration Requirements
deepseek-r1:1.5b151.1 GB2~3 GBMacBook Air (M2/M3 chip, ≥8GB memory)
deepseek-r1:7b704.7 GB5~7 GBMacBook Air or Mac mini (M2/M3/M4 chip, ≥16GB memory)
deepseek-r1:8b804.9 GB6~8 GBMacBook Air or Mac mini (M2/M3/M4 chip, ≥16GB memory)
deepseek-r1:14b1409 GB10~14 GBMacBook Pro (M2/M3/M4 Pro chip, ≥32GB memory)
deepseek-r1:32b32020 GB22~25 GBMac Studio (M2 Max/Ultra) or MacBook Pro (M2/M3/M4 Max, ≥48GB memory)
deepseek-r1:70b70043 GB>45 GBMac Studio (M2 Max/Ultra) or MacBook Pro (M2/M3/M4 Max, ≥64GB memory)

How to Deploy DeepSeek R1 Locally?#

Note: I am using a Mac mini M4 model, and the deployment on Windows is similar to that on Mac.

  • Two tools need to be downloaded

    1. Ollama
    2. AnythingLLM
  • Installation Flowchart

    Flowchart-1

1. Ollama#

  • Mainly used to install and run various large models locally, including DeepSeek.

    Ollama

    • Ollama is a free open-source project designed as an open-source tool for conveniently deploying and running LLMs on local machines, allowing users to easily load, run, and interact with various LLM models without needing to understand complex underlying technologies.
    • Features of Ollama:
      • Local Deployment: Does not rely on cloud services, allowing users to run models on their own devices, protecting data privacy.
      • Multi-Operating System Support: Can be easily installed and used on Mac, Linux, or Windows.
      • Multi-Model Support: Ollama supports various popular LLM models, such as Llama, Falcon, etc., including the recently updated Meta's newly open-sourced large model llama3.1 405B, allowing users to choose different models based on their needs and run them with one click.
      • Ease of Use: Provides an intuitive command-line interface that is simple to operate and easy to get started.
      • Extensibility: Supports custom configurations, allowing users to optimize based on their hardware environment and model requirements.
      • Open Source: The code is fully open, allowing users to view, modify, and distribute freely (though not many will modify it).

2. DeepSeek R1#

  • Find deepseek-r1 on the Ollama website and install it in the Mac terminal.

    Ollama

  • Installation

    Install deepseek-r1 in Ollama

    1. Go back to the Ollama website, select Models, and choose deepseek-r1.

      1-001

    2. Here, we default to selecting the model with 7b parameters, and we will use the recommended 7b parameter model.

      https://ollama.com/library/deepseek-r1

      1-001-deepseek model

    3. Open the Mac terminal and copy this command line.

      ollama run deepseek-r1:7b
      
      • If the download speed slows down or pauses, simply hold down the Control+c key and re-execute the command; you will be amazed to find that the download speed increases, as it supports resuming downloads.

        Screenshot 2025-02-19_18.37.55

    4. If you see success at the bottom, it means the installation was successful.

    5. Now we can freely input any questions we want to ask in this terminal window.

      1-003

3. Embedding Models#

  • Explanation
    • Embedding models are techniques that convert high-dimensional data such as text and images into low-dimensional vectors, focusing on capturing semantic information for easier machine learning processing.
    • Embedding models are the "translators" of AI, converting complex data into vectors that machines can understand, driving applications related to semantic understanding.
    • Common Types and Features
TypeModelFeatures
Word Embeddinge.g., Word2Vec, GloVeMaps words to vectors, capturing semantic relationships (e.g., "king - man + woman ≈ queen")
Contextual Embeddinge.g., BERT, GPTGenerates dynamic vectors based on context (e.g., "apple" has different meanings in "eating an apple" and "Apple phone")
Sentence/Document Embeddinge.g., Sentence-BERTRepresents entire sentences or paragraphs as vectors for similarity calculations, clustering, etc.
Multimodal Embeddinge.g., CLIPJointly processes text and images/audio, supporting cross-modal retrieval (e.g., searching for images using text).
  • Explanation
    • Embedding models are techniques that convert high-dimensional data such as text and images into low-dimensional vectors, focusing on capturing semantic information for easier machine learning processing.
    • Embedding models are the "translators" of AI, converting complex data into vectors that machines can understand, driving applications related to semantic understanding.
    • Common Types and Features
  • We will use the BGE-M3 model from the embedding models.
    • Explanation of BGE-M3.
      • Language Versatility
        • Supports over 100 languages, allowing precise matching when searching for English materials in Chinese or Spanish news in Japanese.
      • Dual Search Modes
        • Understanding Meaning: For example, searching for "pets" can also find content related to "cats and dogs."
        • Keyword Matching: For example, strictly searching for articles containing "AI" or "artificial intelligence" without missing results.
      • Long Articles Without Fragmentation
        • When reading long texts such as papers or contracts, it can remember the overall content without forgetting earlier parts like ordinary tools.
      • Resource Efficient
        • Has a compact version (e.g., "mini version") that can be used on phones or small websites without lag.
    • Download bge-m3
      • Open the Mac terminal and enter

        ollama pull bge-m3
        
      • If you see success, the installation was successful.

        1-001 1

        http://127.0.0.1:11434

4. AnythingLLM#

  1. Explanation

    • AnythingLLM replaces the terminal window with a simple UI user window.
    • AnythingLLM helps us build a personal local knowledge base.
    • AnythingLLM supports multiple input methods, including text, images, and audio, and can segment and vectorize documents in formats such as PDF, TXT, DOCX, using RAG (Retrieval-Augmented Generation) technology to reference document content in conversations with LLMs.

    Main Features:

    • Multi-user Management and Permission Control: Makes team collaboration easier, allowing everyone to use LLM securely.
    • AI Agent Support: Comes with a powerful AI Agent that can perform complex tasks such as web browsing and code execution, increasing automation.
    • Embeddable Chat Window: Easily integrates into your website or application, providing users with an AI-driven conversational experience.
    • Wide File Format Support: Supports various document types such as PDF, TXT, DOCX, meeting different scenario needs.
    • Vector Database Management: Provides an easy-to-use interface for managing documents in the vector database, facilitating knowledge management.
    • Flexible Conversation Modes: Supports both chat and query conversation modes, meeting different scenario needs.
    • Information Source Tracking: Provides referenced document content during conversations, making it easier to trace information sources and enhancing result credibility.
    • Multiple Deployment Options: Supports 100% cloud deployment and local deployment, meeting different user needs.
    • Customizable LLM Models: Allows you to use your own LLM models, offering higher customization to meet personalized needs.
    • Efficient Handling of Large Documents: Compared to other document chat solutions, AnythingLLM is more efficient and cost-effective in handling large documents, potentially saving up to 90% in costs.
    • Developer Friendly: Provides a complete set of developer APIs for easy custom integration and greater extensibility.
  2. Download, Install, Configure

    • Download
    • Installation
      • Click to start

        1-001 2

      • Select Ollama

        1-002

      • Click next

        1-003 1

      • Skip the survey

        1-004

      • Enter any workspace name, let's call it Little Fishing Assistant for now

        1-005

      • Once you see Workspace created successfully, the installation is complete.

        1-006

    • Configuration
      • Click the 🔧 in the lower left corner, find Customization, Display Language, and select Chinese
        1-007

      • Select Embedder Preferences

      • Embedding Engine Provider, select Ollama

      • Ollama Embedding Model, select the recently downloaded bge-3

      • Save changes
        Screenshot 2025-02-26_11.20.47

  3. Workspace

    • Purpose Explanation:
      • Categorization
        • Create different "rooms" for different tasks: for example, one room handles customer service Q&A, while another analyzes contract documents, avoiding interference and data mixing.
      • Feeding Information to AI
        • Upload documents, web pages, or notes to the workspace (like "preparing lessons" for AI), allowing it to learn your exclusive knowledge base.
      • Trial and Error
        • Directly ask questions in the workspace (e.g., simulating customer inquiries), watching AI responses in real-time and adjusting AI instructions as needed.
    • Settings
      • Click the ⚙️ in the workspace
      • General Settings
        • Here you can delete the workspace
      • Chat Settings
        • Set the chat mode to query (which will only provide answers based on the context of found documents)
        • Chat Prompt
  4. Building a Personal Knowledge Base

    • Click the Little Fishing Assistant ⏫ button

      Screenshot 2025-02-26_11.14.14

    • Upload the prepared documents to the left knowledge base, then move them to the right Little Fishing Assistant and click save.

      Screenshot 2025-02-26_11.16.40

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.