Sam's Dev Life

探索技术,记录生活

Chain-of-Thought (CoT)

Implement step-by-step reasoning to improve AI performance on complex problems

What is Chain-of-Thought?

Chain-of-Thought (CoT) prompting is a technique that enhances the reasoning capabilities of Large Language Models by generating intermediate reasoning steps. Instead of jumping directly to an answer, the AI is guided to “think out loud” through each step of the problem-solving process, leading to more accurate and explainable results.

This approach is particularly effective because LLMs often struggle with tasks requiring logical reasoning, mathematical calculations, or multi-step problem solving when they attempt to provide immediate answers.

Why Use Chain-of-Thought?

Advantages:

  • Improved Accuracy: Dramatically reduces errors on complex reasoning tasks
  • Low-Effort Implementation: Works with off-the-shelf LLMs without fine-tuning
  • Explainable AI: Users can follow and validate the reasoning process
  • Debugging Capability: Easy to identify where reasoning went wrong
  • Model Robustness: Performance remains consistent across different LLM versions
  • Versatile Applications: Effective for math, logic, code generation, and analysis

Trade-offs:

  • Higher Token Cost: More output tokens mean increased API costs
  • Slower Response Time: Additional reasoning steps take longer to generate
  • Verbosity: Responses are longer and may require post-processing

Zero-Shot vs Few-Shot CoT

Zero-Shot Chain-of-Thought

The simplest form of CoT uses trigger phrases like “Let’s think step by step” to encourage reasoning:

Zero-Shot CoT Example
1
2
3
4
5
6
7
8
9
10
11
---
provider: OpenAI
model: gpt-4o
temperature: 0.1
---

# Age Problem Solver

When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. How old is my partner?

Let's think step by step.

Expected Output:

1
2
3
4
5
6
1. When I was 3 years old, my partner was 3 × 3 = 9 years old
2. The age difference between us is 9 - 3 = 6 years (partner is older)
3. This age difference remains constant over time
4. Now I am 20 years old, so my partner is 20 + 6 = 26 years old

Answer: My partner is 26 years old.

Few-Shot Chain-of-Thought

Providing examples of reasoning improves consistency and teaches the desired thinking pattern:

Few-Shot CoT Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
---
provider: OpenAI
model: gpt-4o
temperature: 0.1
---

# Age Problem Solver with Examples

Q: When my brother was 2 years old, I was double his age. Now I am 40 years old. How old is my brother? Let's think step by step.

A: When my brother was 2 years old, I was 2 × 2 = 4 years old. That's an age difference of 4 - 2 = 2 years, and I am older. Now I am 40 years old, so my brother is 40 - 2 = 38 years old. The answer is 38.

Q: When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. How old is my partner? Let's think step by step.

A: [Let the AI complete this using the pattern from the example]

Common Failure Patterns

Without CoT (Problematic):

1
2
Prompt: When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. How old is my partner?
Output: 63 years old ❌

With CoT (Improved):

1
2
Prompt: [Same question] Let's think step by step.
Output: [Step-by-step reasoning leading to] 26 years old ✅

When to Use Chain-of-Thought

CoT is particularly effective for tasks that benefit from explicit reasoning:

Ideal Use Cases:

  • Mathematical Problems: Arithmetic, algebra, geometry calculations
  • Code Generation: Breaking down requirements into implementable steps
  • Logical Reasoning: Puzzles, deduction, inference problems
  • Synthetic Data Creation: Guided assumption-making and content generation
  • Complex Analysis: Multi-factor decision making, comparative analysis
  • Process Planning: Step-by-step procedure development

Decision Rule:

If you can explain the steps to solve the problem manually, CoT will likely improve AI performance.

Effective CoT Trigger Phrases

Different trigger phrases work better for different types of problems:

CoT Triggers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
provider: OpenAI
model: gpt-4o
temperature: 0.1
---

# Mathematical/Logical Problems
"Let's think step by step."
"Let's work through this systematically."
"Let's break this down into steps."

# Analysis Tasks
"Let's analyze this carefully."
"Let's examine each component."
"Let's think through the implications."

# Creative/Planning Tasks
"Let's approach this methodically."
"Let's consider each aspect."
"Let's build this solution piece by piece."

# Code Generation
"Let's implement this step by step."
"Let's break down the requirements first."
"Let's design the solution systematically."

Problem: {{ user_problem }}

{{ trigger_phrase }}

Practical CoT Examples

Synthetic Data Generation with CoT

Synthetic Data CoT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
provider: OpenAI
model: gpt-4o
temperature: 0.3
---

# Product Description Generator

Product: {{ product_name }}

Let's create a compelling product description by thinking through this step by step:

Step 1: Analyze the product name
- What type of product does this suggest?
- What market segment would this target?
- What key features can we infer?

Step 2: Make reasonable assumptions
- Who is the target customer?
- What problems does this solve?
- What are the key selling points?

Step 3: Structure the description
- Opening hook to grab attention
- Key features and benefits
- Social proof or credibility elements
- Call to action

Step 4: Write the description
Based on my analysis and assumptions:

Mathematical Problem Solving

Advanced Math CoT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
provider: OpenAI
model: gpt-4o
temperature: 0.1
---

# Word Problem Solver

Problem: {{ math_word_problem }}

Let me solve this step by step:

Step 1: Extract the key information
- What quantities are given?
- What relationships exist between them?
- What am I asked to find?

Step 2: Set up the mathematical model
- Define variables for unknown quantities
- Write equations based on the relationships
- Identify the mathematical operations needed

Step 3: Solve systematically
- Perform calculations in logical order
- Show each algebraic step
- Check intermediate results

Step 4: Verify and interpret
- Does the answer make logical sense?
- Does it satisfy the original constraints?
- Express the final answer clearly

Solution:

Advanced CoT with Latitude Chains

LLM perform better when they can reason through complex problems step by step. In the case of Latitude <step> blocks what they do is to call the AI only with the content inside the <step> block, so the AI can focus on that specific part of the reasoning process. This allows for more structured and manageable reasoning.

Doing this way is more expensive than a single prompt, but it allows for more complex reasoning and better results. Is more expensive because it does N calls to the AI, where N is the number of `` blocks. And the amount of context of the steps is accumulated, so the AI can use all the context of the previous steps.
Multi-Step CoT Chain
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
provider: OpenAI
model: gpt-4o
temperature: 0.2
---

<step>
# Step 1: Problem Analysis

Let's analyze this business scenario step by step: {{ business_scenario }}

## Initial Assessment:
1. **Key Stakeholders**: Who are the main parties involved?
2. **Core Problem**: What is the fundamental issue?
3. **Constraints**: What limitations do we need to consider?
4. **Success Metrics**: How will we measure success?

## Analysis:
</step>

<step>
# Step 2: Solution Brainstorming

Based on my analysis: {{ problem_analysis }}

Now let me generate potential solutions:

## Brainstorming Process:
1. **Traditional Approaches**: What are the conventional solutions?
2. **Innovative Options**: What creative alternatives exist?
3. **Resource Requirements**: What would each solution need?
4. **Risk Assessment**: What are the potential downsides?

## Potential Solutions:
</step>

<step>
# Step 3: Solution Evaluation

Given these potential solutions: {{ solution_brainstorming }}

Let me evaluate each option systematically:

## Evaluation Criteria:
1. **Feasibility** (1-10): How realistic is implementation?
2. **Impact** (1-10): How effective will this be?
3. **Cost** (1-10): How resource-efficient is this? (10 = low cost)
4. **Timeline** (1-10): How quickly can this be implemented? (10 = very fast)

## Solution Rankings:
</step>

<step>
# Step 4: Implementation Planning

Based on the evaluation: {{ solution_evaluation }}

The recommended solution is: [Top-ranked solution]

## Implementation Plan:
1. **Phase 1** (Weeks 1-2): [Initial steps]
2. **Phase 2** (Weeks 3-4): [Development phase]
3. **Phase 3** (Weeks 5-6): [Testing and refinement]
4. **Phase 4** (Weeks 7-8): [Full implementation]

## Risk Mitigation:
- **Risk 1**: [Potential issue] → **Mitigation**: [How to address]
- **Risk 2**: [Potential issue] → **Mitigation**: [How to address]

## Success Metrics:
- **Short-term** (1 month): [Immediate indicators]
- **Medium-term** (3 months): [Progress markers]
- **Long-term** (6+ months): [Ultimate success measures]
</step>

CoT for Different Domains

Scientific Analysis

Scientific CoT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
provider: OpenAI
model: gpt-4o
temperature: 0.1
---

# Scientific Method with Chain-of-Thought

Apply the scientific method to analyze: {{ research_question }}

## Step 1: Observation and Question Formation
- **Observation**: What have we observed?
- **Research Question**: What specific question are we trying to answer?
- **Background**: What do we already know about this topic?

## Step 2: Hypothesis Development
- **Hypothesis**: What do we predict will happen?
- **Reasoning**: Why do we think this will occur?
- **Variables**: What factors might influence the outcome?

## Step 3: Experimental Design
- **Method**: How would we test this hypothesis?
- **Controls**: What variables need to be controlled?
- **Measurements**: What data would we collect?

## Step 4: Data Analysis Framework
- **Expected Results**: What patterns would support our hypothesis?
- **Alternative Explanations**: What other factors could explain results?
- **Statistical Considerations**: How would we ensure reliability?

## Step 5: Conclusion and Implications
- **Interpretation**: What would different results mean?
- **Limitations**: What are the constraints of this approach?
- **Next Steps**: How would this lead to further research?

## Analysis:
[Apply this framework to the given research question]
Legal CoT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
provider: OpenAI
model: gpt-4o
temperature: 0.1
---

# Legal Analysis with Chain-of-Thought

Analyze this legal scenario step by step: {{ legal_scenario }}

## Step 1: Fact Pattern Analysis
- **Key Facts**: What are the essential facts?
- **Parties Involved**: Who are the relevant parties?
- **Timeline**: What is the sequence of events?
- **Jurisdiction**: What legal system applies?

## Step 2: Legal Issue Identification
- **Primary Issues**: What are the main legal questions?
- **Secondary Issues**: What related questions arise?
- **Precedent Relevance**: What similar cases might apply?

## Step 3: Rule Identification
- **Applicable Laws**: What statutes or regulations apply?
- **Case Law**: What precedents are relevant?
- **Legal Standards**: What tests or criteria apply?

## Step 4: Application of Law to Facts
- **Element Analysis**: How do the facts satisfy each legal element?
- **Counterarguments**: What opposing positions exist?
- **Distinguishing Cases**: How is this different from precedents?

## Step 5: Conclusion and Reasoning
- **Legal Conclusion**: What is the most likely outcome?
- **Strength of Position**: How strong is each side's case?
- **Risk Assessment**: What are the uncertainties?

## Analysis:
[Apply this legal reasoning framework]

CoT with Self-Correction

Self-Correcting CoT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
provider: OpenAI
model: gpt-4o
temperature: 0.3
---

<step>
# Initial Reasoning Attempt

Problem: {{ complex_problem }}

Let me work through this step by step:

1. **Understanding**: [Break down the problem]
2. **Approach**: [Choose a method]
3. **Execution**: [Work through the solution]
4. **Result**: [State the initial answer]

Initial Solution:
</step>

<step>
# Self-Critique and Error Checking

Let me review my initial reasoning: {{ initial_reasoning }}

## Error Checking:
1. **Logic Verification**: Are my reasoning steps sound?
2. **Calculation Check**: Are my computations correct?
3. **Assumption Review**: What assumptions did I make?
4. **Alternative Approaches**: Could I solve this differently?

## Potential Issues Found:
- [List any problems identified]

## Confidence Level**: [High/Medium/Low] because [reasoning]
</step>

<step>
# Revised Solution (if needed)

Based on my self-critique

If the initial reasoning had issues, let me correct it:
## Corrections Made:
1. **Issue**: [Problem identified]
**Correction**: [How I fixed it]

## Revised Step-by-Step Solution:
[Work through the corrected solution]

## Final Answer: [Corrected result]

Otherwise, confirm the original reasoning:
## Confirmation:
My initial reasoning appears sound. The original answer stands.

## Final Answer: [Original result confirmed]
</step>

CoT with Multiple Perspectives

Multi-Perspective CoT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
provider: OpenAI
model: gpt-4o
temperature: 0.4
type: agent
agents:
- agents/analyst_a
- agents/analyst_b
- agents/synthesizer
---

# Multi-Perspective Analysis

Analyze this complex issue: {{ complex_issue }}

Use multiple analytical perspectives and then synthesize the findings.

## Analysis Framework:

### Perspective A: {{ perspective_a_description }}
- Apply this analytical lens step by step
- Focus on {{ perspective_a_focus }}

### Perspective B: {{ perspective_b_description }}
- Apply this different analytical approach
- Emphasize {{ perspective_b_focus }}

### Synthesis:
- Compare and contrast the perspectives
- Identify points of agreement and disagreement
- Develop a comprehensive understanding

Coordinate the analysis across agents and provide a unified conclusion.
agents/analyst_a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
provider: OpenAI
model: gpt-4o
temperature: 0.2
type: agent
---

# Perspective A Analysis: {{ perspective_a_description }}

I'll analyze the issue through this specific lens: {{ complex_issue }}

## Step-by-Step Analysis:

1. **Framework Application**: How does {{ perspective_a_description }} apply here?
2. **Key Factors**: What elements are most important from this perspective?
3. **Methodology**: What analytical tools should I use?
4. **Evidence Gathering**: What information supports this view?
5. **Reasoning Chain**: How do these factors connect?
6. **Conclusions**: What does this perspective suggest?

## Detailed Analysis:
[Work through each step systematically]

## Key Insights from Perspective A:
- [Primary findings]
- [Supporting evidence]
- [Implications]

Integration with Latitude Features

CoT with Dynamic Variables

Dynamic CoT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
provider: OpenAI
model: gpt-4o
temperature: 0.2
---

# Adaptive Chain-of-Thought

The reasoning approach adapts based on the problem type: {{ problem_type }}

{{ if problem_type === "mathematical" }}
## Mathematical Problem-Solving Steps:
1. **Parse the Problem**: Extract numbers, operations, and relationships
2. **Identify the Method**: Choose appropriate mathematical approach
3. **Set Up Equations**: Translate word problem to mathematical expressions
4. **Solve Step-by-Step**: Show all algebraic manipulations
5. **Verify**: Check answer by substitution or alternative method
{{ endif }}


{{ if problem_type === "analytical" }}
## Analytical Reasoning Steps:
1. **Decompose**: Break complex issue into component parts
2. **Research**: Gather relevant information and context
3. **Framework**: Apply appropriate analytical model
4. **Synthesize**: Combine insights from different sources
5. **Conclude**: Draw evidence-based conclusions
{{ endif }}

{{ if problem_type === "creative" }}
## Creative Problem-Solving Steps:
1. **Understand**: Deeply comprehend the challenge
2. **Diverge**: Generate multiple creative options
3. **Combine**: Mix and match ideas innovatively
4. **Evaluate**: Assess feasibility and impact
5. **Refine**: Improve the most promising solutions
{{ endif }}

## Problem to Solve:
{{ user_problem }}

## Step-by-Step Solution:
[Apply the appropriate framework above]

CoT with Tool Integration

CoT with Tools
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
provider: OpenAI
model: gpt-4o
temperature: 0.2
tools:
- latitude/search
- latitude/extract
---

# Research-Enhanced Chain-of-Thought

Let me solve this complex question step by step: {{ research_question }}

## Step 1: Information Gathering
First, I need to research the current facts:

## Step 2: Information Analysis
Based on the search results, let me analyze:
- **Key Facts**: [Extract relevant information]
- **Data Quality**: [Assess reliability of sources]
- **Gaps**: [Identify missing information]

## Step 3: Additional Research (if needed)
Extract specific data that is still unclear or missing.

## Step 4: Reasoning Chain
Now I'll work through the logic:
1. **Given Information**: [Summarize what we know]
2. **Logical Connections**: [Show how facts relate]
3. **Inference Steps**: [Build the argument]
4. **Supporting Evidence**: [Reference research findings]

## Step 5: Conclusion
Based on this systematic analysis:
[Present final answer with full reasoning]

Best Practices

Choosing the Right CoT Approach

  • Zero-Shot CoT: Use simple trigger phrases like “Let’s think step by step” for straightforward problems
  • Few-Shot CoT: Provide examples when you need consistent reasoning patterns or specific approaches
  • Multi-Step Chains: Use Latitude <step> blocks for complex problems requiring focused attention on each phase
  • Cost Consideration: Balance reasoning quality with token costs - more steps = better results but higher costs

Effective Prompt Design

  • Clear Step Labels: Use numbered steps or clear headers to guide reasoning
  • Logical Flow: Ensure each step builds logically on the previous one
  • Explicit Instructions: Always include trigger phrases to activate reasoning mode
  • Verification Steps: Include self-checking and validation mechanisms
  • Domain-Specific Language: Use terminology and approaches familiar to the problem domain

Optimizing Performance

  • Model Selection: Use GPT-4 or Claude for complex reasoning tasks
  • Temperature Settings: Lower temperature (0.1-0.3) for logical/mathematical problems
  • Token Management: Balance reasoning detail with cost efficiency
  • Error Handling: Include correction and retry mechanisms
  • Robustness: CoT helps maintain performance across different LLM versions

Domain-Specific Adaptations

  • Mathematical Problems: Focus on step-by-step calculations and verification
  • Code Generation: Break down requirements before implementation
  • Scientific Analysis: Emphasize hypothesis formation and testing
  • Business Decisions: Include stakeholder analysis and risk assessment
  • Creative Tasks: Allow for iterative refinement and exploration

Cost-Benefit Analysis

  • When CoT is Worth It: Complex reasoning, high-stakes decisions, mathematical problems
  • When to Avoid: Simple factual queries, high-volume/low-cost applications
  • Optimization: Use shorter reasoning chains for simpler problems
  • Monitoring: Track accuracy improvements vs. cost increases

Common Pitfalls

Critical Mistakes to Avoid:

Reasoning Errors:

  • Skipping Logical Steps: Don’t let the AI jump to conclusions without showing work
  • Unclear Transitions: Make connections between steps explicit and logical
  • Missing Verification: Always include checking mechanisms and validation steps
  • Assuming Expertise: Remember that LLMs can make confident but incorrect mathematical errors

Implementation Issues:

  • Over-complexity: Keep steps manageable - too many steps can confuse the model
  • Inconsistent Patterns: When using few-shot, ensure examples follow the same reasoning structure
  • Wrong Trigger Phrases: Some phrases work better for different problem types
  • Ignoring Context: Make sure reasoning steps are appropriate for the problem domain

Cost Management:

  • Unnecessary Verbosity: Don’t use CoT for simple factual queries that don’t need reasoning
  • Excessive Steps: More steps aren’t always better - find the right balance
  • Poor Token Planning: Account for the 2-3x token increase when budgeting

When NOT to Use CoT

CoT isn’t always the best approach. Avoid it for:

  • Simple Factual Queries: “What is the capital of France?” doesn’t need reasoning steps
  • High-Volume Applications: When processing thousands of requests where cost matters more than reasoning
  • Well-Defined Formats: When you need consistent, structured outputs without explanation
  • Time-Sensitive Tasks: When response speed is more important than reasoning quality
  • Retrieval Tasks: When the answer exists in a knowledge base and doesn’t require reasoning

Implementation Checklist

When implementing CoT in your prompts, use this checklist:

✅ Pre-Implementation

  • Confirm the task benefits from step-by-step reasoning
  • Choose appropriate CoT type (zero-shot vs few-shot vs multi-step)
  • Select effective trigger phrases for your domain
  • Plan for increased token costs (typically 2-3x)

✅ Prompt Design

  • Include clear step labels and logical flow
  • Add verification/checking steps
  • Provide examples if using few-shot approach
  • Test with edge cases and failure scenarios

✅ Optimization

  • Adjust temperature based on task type (lower for logic/math)
  • Monitor accuracy improvements vs cost increases
  • Iterate on step structure based on results
  • Consider using Latitude <step> blocks for complex reasoning

Key Takeaways

Chain-of-Thought prompting transforms how LLMs approach complex problems by making their reasoning explicit and systematic. Here are the essential points:

Core Benefits:

  • Dramatic accuracy improvements on reasoning tasks without model fine-tuning
  • Explainable results that allow debugging and validation
  • Robust performance across different LLM versions

Best Applications:

  • Mathematical and logical problems
  • Code generation with requirement breakdown
  • Complex analysis requiring multiple perspectives
  • Any task where you can explain the solution steps manually

Cost Considerations:

  • 2-3x more tokens means higher costs and slower responses
  • Use strategically for high-value, complex reasoning tasks
  • Consider simpler approaches for basic queries

Implementation Success Factors:

  • Choose the right CoT variant (zero-shot, few-shot, or multi-step)
  • Use domain-appropriate trigger phrases and terminology
  • Include verification steps to catch reasoning errors
  • Balance reasoning depth with practical constraints

Chain-of-Thought is a low-effort, high-impact technique that can significantly improve AI performance on complex tasks. The key is knowing when and how to apply it effectively.

Advanced CoT Patterns

CoT with Error Correction

Self-Correcting CoT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
provider: OpenAI
model: gpt-4o
temperature: 0.2
---

# Problem Solving with Validation

Problem: {{ complex_problem }}

## Initial Reasoning
Let me work through this step by step:

1. **Understanding**: [Break down the problem]
2. **Approach**: [Choose methodology]
3. **Execution**: [Show work]
4. **Initial Answer**: [State result]

## Self-Validation
Now let me check my work:

1. **Logic Check**: Are my reasoning steps sound?
2. **Calculation Verification**: Let me double-check any math
3. **Sanity Test**: Does this result make intuitive sense?
4. **Alternative Approach**: Can I solve this differently to confirm?

## Final Answer
Based on validation: [Confirmed or corrected result]

CoT with Confidence Scoring

Confidence-Aware CoT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
provider: OpenAI
model: gpt-4o
temperature: 0.1
---

# Reasoning with Confidence Assessment

Problem: {{ problem_statement }}

## Step-by-Step Analysis
[Standard CoT reasoning steps]

## Confidence Assessment
For each step, I'll rate my confidence (1-10):

- **Step 1 Confidence**: 9/10 - Clear factual information
- **Step 2 Confidence**: 7/10 - Some assumptions required
- **Step 3 Confidence**: 8/10 - Standard methodology applied
- **Overall Confidence**: 8/10

## Risk Factors
- **Potential Issues**: [What could go wrong]
- **Missing Information**: [What would improve confidence]
- **Alternative Scenarios**: [Other possible outcomes]

## Conclusion
Answer: [Result] (Confidence: X/10)

Next Techniques

Explore these related prompting techniques:

copy from chain-of-thought

实用的视频处理工具分享

在日常工作和生活中,我们经常需要录制和编辑视频。今天我想分享两个实用的工具/技巧,它们可以帮助你更高效地处理视频内容。

QuickRecorder - 简单好用的录屏工具

QuickRecorder 是一款轻量级的屏幕录制工具。它的主要特点包括:

  • 操作简单直观
  • 占用系统资源少
  • 支持录制全屏或选定区域
  • 可以录制系统声音和麦克风声音
  • 输出视频质量好

Photos 应用 - 视频裁剪小技巧

很多人可能不知道,macOS 自带的 Photos 应用不仅可以裁剪照片,还可以用同样的方式裁剪视频的画面大小,无需安装其他专业软件。

裁剪视频的步骤

  1. 打开 Photos 应用
  2. 导入需要裁剪的视频
  3. 双击打开视频
  4. 点击编辑按钮
  5. 选择裁剪工具
  6. 通过拖动边框或角落来调整视频画面大小
  7. 点击完成保存修改

这个方法特别适合需要调整视频画面比例或去除画面边缘内容的场景,操作方式与裁剪照片完全相同,非常直观。

总结

这两个工具/技巧虽然简单,但在日常使用中非常实用。QuickRecorder 满足了基础的录屏需求,而 Photos 的裁剪功能则可以帮助我们快速处理视频长度。希望这些分享对你有帮助!

you_can_use_macs_buildin_photos_app_to_crop/

用 Cursor AI 整理博客

引言

最近在折腾笔记工具的时候,发现 Notion 虽然很香,但它的 AI 功能要付费而且限制不少。突然灵光一闪 —— 我天天用的 Cursor 不就自带 AI 吗?既是顺手的 IDE,又能当笔记助手,何乐而不为?

实践过程

说干就干,我开始了这次”搬家”之旅。先是把 Notion 里的笔记导出成 Markdown,然后用 Cursor 搭了个 Hexo 项目。说实话,有 AI 帮忙,这过程简直不要太舒服 —— 就算你对 Hexo 一窍不通,有 AI 在旁边指点,也能轻松搞定。

借助 Cursor AI 的帮助,我完成了这些工作:

  • 规范化文件命名:将 Notion 导出的冗长 ID 式文件名转换为语义化命名
  • 时间格式转换:调整 Notion 的时间戳以符合 Hexo 的格式要求
  • 标签体系构建:为文章添加合适的分类标签
  • 主题样式优化:定制更符合个人审美的界面风格
  • 搜索功能实现:集成全站搜索能力
  • 部署流程搭建:完成站点部署的全流程配置

使用体验

在实践过程中,也发现了一些值得注意的问题:

  1. AI 有时候会卡顿:可能是在思考人生吧
  2. 文件变化反应慢:等 MCP 扩展更新应该会好点
  3. AI 有时候太积极:不给它划好范围,它能把整篇文章都改了

总结

用 Cursor AI 整理文档这事儿,说实话挺有意思的。虽然前前后后折腾了 5-6 个小时,但确实值得。对于需要经常整理文档的人来说,这绝对是个不错的选择。

要是能把常用的操作做成模板,一键就能搞定,那就更完美了。不过以开源社区的速度,估计很快就会有人做出来。

用 Claude 整理音乐文件:尝试MCP

引言

前几天整理电脑时,发现音乐文件夹里躺着一堆从各个渠道收集来的歌曲。文件名五花八门,有的是繁体中文带英文,有的带着各种特殊符号,看起来很是混乱。正好我在使用 Claude Desktop,突然想到也许可以让 AI 帮我处理这个问题。

结果这一尝试不仅完美解决了我的问题,还让我发现了 AI 助手的更多可能性。今天想和大家分享这个有趣的经历。

Claude Desktop 能做什么

很多人可能和我之前一样,觉得 AI 助手就是用来聊天、问问题的。但其实 Claude Desktop 版本具备文件系统访问能力,这意味着它可以:

  • 读取和写入文件
  • 重命名文件
  • 创建和管理目录
  • 搜索文件
  • 分析文件内容

简单说,它就像一个懂编程的助手,可以帮你处理实际的文件操作任务。

整理音乐文件的实践

第一步是让 Claude 列出所有文件。看着长长的文件列表,我发现了各种命名风格:

1
2
3
張震嶽 A-Yue【愛我別走 Love me,don't go】Official Music Video.mp3
周華健 Wakin Chau【其實不想走 I didn't intend to go】Official Music Video.mp3
Beyond 05: 光輝歲月.mp3

我告诉 Claude 我想要简单的”歌手-歌名”格式。它立即理解了我的需求,并提出了完整的处理方案:

  1. 统一使用简体中文
  2. 去掉所有额外信息(如 Official Music Video、动态歌词等)
  3. 移除特殊符号
  4. 对于合唱歌曲用”&”连接歌手名
  5. 保持文件扩展名不变

在实际操作过程中,我们遇到了一些挑战。文件名中的单引号导致重命名操作失败,Claude 尝试了多种解决方案:

  • 使用转义字符
  • 尝试 encodeURIComponent
  • 使用 Buffer 处理路径
  • 使用通配符匹配

但由于 filesystem mcp 的实现还不够完善,这些方法都没有成功。最后是我手动清理了文件名中的特殊字符,Claude 才成功完成了重命名操作。这个过程也反映出目前 AI 工具在文件系统操作上还有提升空间。

处理后的文件变成了:

1
2
3
张震岳-爱我别走.mp3
周华健-其实不想走.mp3
Beyond-光辉岁月.mp3

不仅如此,Claude 还能识别一些特殊情况:

  • 对串烧歌曲使用”金曲串烧”命名
  • 正确处理合唱歌曲(如”王馨平&王杰&王韵婵-祈祷.mp3”)
  • 保留乐队名称(如 Beyond)不做改动

整个过程就像和一个细心的助手在合作,它会清楚地告诉我每一步的想法,在遇到问题时积极尝试不同的解决方案,遇到自身限制时也会实事求是地告诉我。这种互动方式让问题解决变得更加透明和有趣。

启发与思考

这次经历让我意识到,AI 助手不仅仅是一个对话工具,它可以真正参与到我们的实际工作中。关键是要学会如何描述问题,以及如何和它互动。

有趣的是,Claude 往往会提供多个解决方案,并且会解释每种方案的优缺点。这种交互方式不仅帮助解决问题,还能学到新的知识。

更多有趣的应用场景

通过这次整理音乐文件的经历,我们看到了 AI 助手在文件管理方面的能力。虽然目前 filesystem mcp 还比较基础,只能进行文件重命名、移动等基本操作,但已经能帮我们解决不少实际问题。

期待未来随着功能的扩展,它能够处理更多复杂的文件操作任务。但目前,我们也要清楚地认识到它的局限性 —— 比如无法读取和分析 MP3 文件的元数据。

写给读者的建议

做好备份,从小任务开始尝试。更重要的是要学会与 AI 助手对话,清晰地描述你的需求,让它成为你解决实际问题的好帮手。

ChatTTS初遇

安装

  1. 获取代码
1
2
git clone https://github.com/2noise/ChatTTS
cd ChatTTS
  1. 安装依赖
1
pip install --upgrade -r requirements.txt

假如有用conda,使用以下代码安装依赖

1
2
3
conda create -n chattts
conda activate chattts
pip install -r requirements.txt
  1. 启动Webui(可选,直接使用下面的代码)
1
python examples/web/webui.py

(在webui中,假如想用个性化音色,需要注意你的录音要和Sample Text一样)

实验

实用系统内置的音色

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import torch
import torchaudio
import ChatTTS
from tools.audio import float_to_int16, has_ffmpeg_installed, load_audio

chat = ChatTTS.Chat()
chat.load(compile=False) # Set compile=True for optimized performance

rand_spk = chat.sample_random_speaker()
print('=================')
print(rand_spk) # save it for later timbre recovery

params_refine_text = ChatTTS.Chat.RefineTextParams(
prompt='[oral_2][laugh_0][break_6]',
)

texts = ["哈克和迪克这两个小东西看起来古灵精怪,其实是两个野心勃勃的危险分子——他们在牙齿上挖洞建房,不仅要修建自己的舒适小窝,还梦想着修建可以出租的豪华公寓……",
"就在他们的梦想快要实现的时候,一把大刷子带着很多警察出现在牙齿大街上。哈克和迪克贮藏的粮食几乎被一扫而空。更可怕的事情还在后面,一个巨大的钩子从天而降,伸向了哈克和迪克的家……",
"哈克和迪克的命运将会怎样呢?那些警察是从哪里来的呢?牙齿大街还能恢复往日的平静吗?",
"看完这本图画书之后,聪明的小朋友们肯定会知道,怎样做才能不让哈克和迪克这样的小东西在我们的嘴巴里干坏事!"]

refine_texts = chat.infer(texts,
skip_refine_text=False,
refine_text_only=True,
params_refine_text=params_refine_text,)
print('======refine===========')
for i in range(len(refine_texts)):
print(refine_texts[i])

params_infer_code = ChatTTS.Chat.InferCodeParams(
spk_emb = rand_spk, # add sampled speaker
temperature = .3, # using custom temperature
top_P = 0.7, # top P decode
top_K = 20, # top K decode
)

print('======infer===========')
wavs = chat.infer(refine_texts,
skip_refine_text=True,
params_infer_code=params_infer_code,)

for i in range(len(wavs)):
"""
In some versions of torchaudio, the first line works but in other versions, so does the second line.
"""
try:
torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)
except:
torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]), 24000)

主要就是使用 chat.infer 这个方法,它有一个参数refine_text_only,可以控制是否只是断句。调用完之后返回的是文字列表,中间会增加断句标识。

第二次调用infer方法的时候就可以传入skip_refine_text,因为我们已经实现做好断句了。

另外,rand_spk 这个的输出有点意思,需要留意一下,因为下面我们去做自定义音色的时候会用到,这个方便理解。这个就是编码后音色编码,看起来像一堆中文乱码,不是什么错误,不要紧张。

自定义音色

首先要录制一小段语音,可以念一句话,你可以用系统的录音机,或者用我做的一个录音工具,采样率用24000,录制个5-10秒即可。

假如你不知道念什么,很奇怪,突然让你说一句话,脑子就一片空白,至少我是这样,当时我就找了这么一行。

1
哈克和迪克这两个小东西看起来古灵精怪

下载录音为wav文件,注意看下码率对不对。准备好之后就用下面的方法来生成那个奇怪的文本。复制保存下来就好。

1
2
3
4
sample_audio = load_audio('/Users/sam/Music/sample-me.wav', 24000)
sample_spk_smp = chat.sample_audio_speaker(sample_audio)
print('================')
print(sample_spk_smp)

然后修改params_infer_code参数

1
2
3
4
5
6
7
8
9
10
11

params_infer_code = ChatTTS.Chat.InferCodeParams(
spk_emb = None, # add sampled speaker
temperature = .3, # using custom temperature
top_P = 0.7, # top P decode
top_K = 20, # top K decode
)

params_infer_code.txt_smp = sample_text
params_infer_code.spk_smp = sample_spk_smp
params_infer_code.spk_emb = None

再重新调用infer 生成wav就可以了。

实验结果

[牙齿大街的新鲜事介绍.mp3](ChatTTS初遇 11c0ba99e0cb8072b8f3f286b815a0c2/牙齿大街的新鲜事介绍.mp3)

完整代码供你复制

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import torch
import torchaudio
import ChatTTS
from tools.audio import float_to_int16, has_ffmpeg_installed, load_audio

chat = ChatTTS.Chat()
chat.load(compile=False) # Set compile=True for optimized performance

rand_spk = chat.sample_random_speaker()
print('=================')
print(rand_spk) # save it for later timbre recovery

params_infer_code = ChatTTS.Chat.InferCodeParams(
spk_emb = None, # add sampled speaker
temperature = .3, # using custom temperature
top_P = 0.7, # top P decode
top_K = 20, # top K decode
)

params_refine_text = ChatTTS.Chat.RefineTextParams(
prompt='[oral_2][laugh_0][break_6]',
)

sample_text = "哈克和迪克这两个小东西看起来古灵精怪"
sample_text = "哈克和迪克这两个小东西看起来古灵精怪,其实是两个野心勃勃危险分子——他们在牙齿上挖洞建房,不仅要修建自己的舒适小窝,还梦想着修建出可以出租的豪华公寓……"
sample_audio = load_audio('/Users/sam/Music/sample-me.wav', 24000)
sample_spk_smp = chat.sample_audio_speaker(sample_audio)
print('================')
print(sample_spk_smp)
sample_spk_smp = "伀嫠冀呯伟叐乸乐絁稃蒌棩芡籵囮潁贅缕匰卅窃檯泫懳潱盾簫旬跊恫挝弫熠哦襛劍県娑虅荙偂瑦尾絖诧扐硝件埇唲贸屗榃孥嗇圇岨师袋衟丑睽婢牦砑簹摱稞儎蓌渤吷厹片曌怸劸秴勊浍攙懄名檃脴獙畗弁跷璤嫥兘螵煻壕捻丏焒蚘愢墁冧奰庽脼侲忱莼弣啸礛姿早谯漉吞呙伏穞澭崰枑芮纟嫤桻祙腉谱痚蝱欇橇拂彼丿磜蠛胀柑搪珙嫩宇捶粣凱瞒嵼琅嶷綯橧觽焓諛僂焽貛崶熝烥糱蕪瓏欞觼杞蒏檟蠠敫佩旐趚葘濬劘濺趵俴焔咀螂曻裘巋艋蜆矣倾愆狨簞耺帐瓭懯椱囘恗奵徻圫拏炶犑漧琺璥獮砫綀灡嚂蔁挬滂嶷盁湉掂爗蜷賸胅萳洴捥楌猥圑此跂艳莞矴謦蝖妌為炇筷檷菏纳汥咙攘狣怊棽秔箚翹玑吼荣杮僉副薣浖蜺犡匛劀懫筩誰謙福椅縬孊日胶徇礹濯榤劽睺碀儦懄瞥蕳亄缁谉盰櫜吏句蠲栵倡奮叴枬搶篽全貛瓮攪掉摃媅孎勜垻篇篽岌藡稠猵暄犾葠妘俺菢謩硼虲撖槞撳枆猲岪怯穤誋杫狀紽曁秕蜲祷愡赽獽勦咄汄渀媭漘晬檻晕嶩令皑婪拂程姪棱赑琍縲撑梂趝岌脧琸曯殜嬊翖廏勱荪簝爄匱蛴悔兰爙讒孁谮熌劐诱竾帉搾瓚忪弒斪賾疹猘衻薎蜃绖匙蛧珱续噂峖数芛憓胠咮梻禨惇谲塩穸狃嗍苜磳粩檚去峅匛蟆侏襆赿纝勀捤觇澉炜兯哫拵誣畡舙斬本僡犔籲螘螸藆噎庬上樹瘞悫憥梺潛槵槢寒爬绥帪呮綳獴嗆裧杚滖蟱唳啶晘裫凮灥蝖江贉播景豫僞蓋覐蒤挰坽惪艒證珜蓧劫艒箁瀯垩簃溷藺婺氐熸抻籓洽瓃圷箼縍學澦蔉寋脶筠棳崿匊誅葽戜劀绀芻肿褁喕巤聗沼汯昩毂篨挴直戵耪珴瞭挬舅眠弹擗诈朡服眜瀬屡住啍倩罷眨炦硓憸梒剢嘇衅盔穸捬搩翅蜸煁箚渶嫭慱擡璩碧涺倅簨戒媮滏蕱擉蟼蕝詬詍悍碟妇倪生扫始砟艓埰堷悻莃孩捂環詆係喱儢晝哐塻櫚睡稼媦菤夂綏螫埲膹为緮筱板堿茆仾嚳典甞刴茥火彔堤砑芩熵諅惚襺胑暀昹庮攈粊痛樀㴅"

texts = ["哈克和迪克这两个小东西看起来古灵精怪,其实是两个野心勃勃的危险分子——他们在牙齿上挖洞建房,不仅要修建自己的舒适小窝,还梦想着修建可以出租的豪华公寓……",
"就在他们的梦想快要实现的时候,一把大刷子带着很多警察出现在牙齿大街上。哈克和迪克贮藏的粮食几乎被一扫而空。更可怕的事情还在后面,一个巨大的钩子从天而降,伸向了哈克和迪克的家……",
"哈克和迪克的命运将会怎样呢?那些警察是从哪里来的呢?牙齿大街还能恢复往日的平静吗?",
"看完这本图画书之后,聪明的小朋友们肯定会知道,怎样做才能不让哈克和迪克这样的小东西在我们的嘴巴里干坏事!"]

refine_texts = chat.infer(texts,
skip_refine_text=False,
refine_text_only=True,
params_refine_text=params_refine_text,)
print('======refine===========')
for i in range(len(refine_texts)):
print(refine_texts[i])

params_infer_code.txt_smp = sample_text
params_infer_code.spk_smp = sample_spk_smp
params_infer_code.spk_emb = None

print('======infer===========')
wavs = chat.infer(refine_texts,
skip_refine_text=True,
params_infer_code=params_infer_code,)

for i in range(len(wavs)):
"""
In some versions of torchaudio, the first line works but in other versions, so does the second line.
"""
try:
torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)
except:
torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]), 24000)

Dify踩坑记录

首先感谢 Axton的视频,做的相当仔细,但好像有做了一些引流的屏蔽,并没有分享他的工作文件。

当然还有大神的开源项目translation-agent

于是我自己动手做了一遍,并踩了一些坑,相信不是那么幸运的话,你也会碰到。

工作流定义文件

安装配置dify

1
2
3
4
git clone https://github.com/langgenius/dify
cd dify/docker
docker compose up -d
open http://localhost

就这么简单,除了中间一些下载镜像的时间,并没有多大困难。打开之后需要一个setup,就是设置一下管理员账号和密码。

模型设定

我试了ollama+llama3.1和通义千问,从右上角账户的设定菜单进入,设置比较顺利的。

提醒一下,假如你用ollama,由于dify在docker中运行,你的ollama监听地址需要改成0.0.0.0(参考),地址配置成http://host.docker.internal:11434

第一个工作流,反思翻译

工作流有6步,开始,设定默认值,初始翻译,反思翻译,优化翻译,输出。1,2,6步骤都是常规步骤,初始翻译,反思,优化这三个步骤就是吴恩达老师的精髓。

开始节点,配置了输入参数,包括原语言,目标语言,原文本,国家区域(这个一个优化项)

默认参数节点,就是获取开始节点的输入,对于没有值的变量设置一个默认值。用python脚本实现,其余都差不多。

1
2
3
4
5
6
7
8
9
10

def main(source_lang: str, target_lang: str, contry: str) -> dict:
source_lang = source_lang or 'English'
target_lang = target_lang or 'Chinese'
contry = contry or 'China'
return {
"source_lang": source_lang,
"target_lang": target_lang,
"contry": contry,
}

初始翻译,就是把原文扔给大模型,获取结果。[大括号里面是变量,每一个节点prompt分成两部分,一个system,一个user,下同]

1
2
3
You are an expert linguist, specializing in translation
froms {{#1727140378305.source_lang#}} to
{{#1727140378305.target_lang#}}.
1
2
3
4
5
This is an {{#1727140378305.source_lang#}} to {{#1727140378305.target_lang#}} translation, please provide the {{#1727140134125.target_lang#}} translation for this text. Do not provide any explanations or text apart from the translation.

{{#1727140378305.source_lang#}}: {{#1727140134125.source_text#}}

{{#1727140378305.target_lang#}}

Do not provide any explanations or text apart from the translation. 这个是个关键,告诉大模型只要翻译结果。

**反思翻译,**有点意思,就是让大模型根据初始翻译的结果来给出优化建议

1
2
3
You are an expert linguist specializing in translation from {{#1727140378305.source_lang#}} to {{#1727140378305.target_lang#}}. 

You will be provided with a source text and its translation and your goal is to improve the translation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Your task is to carefully read a source text and a translation from {{#1727140378305.source_lang#}} to {{#1727140378305.target_lang#}}, and then give constructive criticisms and helpful suggestions to improve the translation. 

The source text and initial translation, delimited by XML tags <SOURCE_TEXT></SOURCE_TEXT> and <TRANSLATION></TRANSLATION>, are as follows:

<SOURCE_TEXT>
{{#1727140134125.source_text#}}
</SOURCE_TEXT>

<TRANSLATION>
{{#1727142131868.text#}}
</TRANSLATION>

When writing suggestions, pay attention to whether there are ways to improve the translation's
(i) accuracy (by correcting errors of addition, mistranslation, omission, or untranslated text),
(ii) fluency (by applying {{#1727140378305.target_lang#}} grammar, spelling and punctuation rules, and ensuring there are no unnecessary repetitions),
(iii) style (by ensuring the translations reflect the style of the source text and take into account any cultural context),
(iv) terminology (by ensuring terminology use is consistent and reflects the source text domain; and by only ensuring you use equivalent idioms {{#1727140378305.target_lang#}}).

Write a list of specific, helpful and constructive suggestions for improving the translation.
Each suggestion should address one specific part of the translation.
Output only the suggestions and nothing else.

**优化翻译,**根据反思翻译的修改意见,大模型完成优化。

1
You are an expert linguist, specializing in translation editing from {{#1727140378305.source_lang#}} to {{#1727140378305.target_lang#}}.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Your task is to carefully read, then edit, a translation from {{#1727140378305.source_lang#}} to {{#1727140378305.target_lang#}}, taking into
account a list of expert suggestions and constructive criticisms.

The source text, the initial translation, and the expert linguist suggestions are delimited by XML tags <SOURCE_TEXT></SOURCE_TEXT>, <TRANSLATION></TRANSLATION> and <EXPERT_SUGGESTIONS></EXPERT_SUGGESTIONS>

as follows:

<SOURCE_TEXT>
{{#1727140134125.source_text#}}
</SOURCE_TEXT>

<TRANSLATION>
{{#1727142131868.text#}}
</TRANSLATION>

<EXPERT_SUGGESTIONS>
{{#1727142902026.text#}}
</EXPERT_SUGGESTIONS>

Please take into account the expert suggestions when editing the translation. Edit the translation by ensuring:

(i) accuracy (by correcting errors of addition, mistranslation, omission, or untranslated text),
(ii) fluency (by applying {{#1727140378305.target_lang#}} grammar, spelling and punctuation rules and ensuring there are no unnecessary repetitions),
(iii) style (by ensuring the translations reflect the style of the source text)
(iv) terminology (inappropriate for context, inconsistent use), or
(v) other errors.

Output only the new translation and nothing else.

输出节点,就是配置最终输出什么内容

实验

网上随便找点什么内容测试一下。

I have listed out few of them below and based on your budget, resources, technical skills you could either choose to setup your own or get some commercial. Some of the commercial products might allow a free trial account.

Self-Managed:

Cuckoo – Cuckoo or modified cuckoo does good job covering different OS platforms.

https://drakvuf.com/ – Unlike cuckoo, this is agentless. The setup for this is quiet involved but the results are great.

Sandboxie

Noriben (not exactly a sandbox but does a decent job in Behavioural) – A python script which montiors via ProcMon. Simple easy to setup in a VM. Again not exactly a Sandbox and you would miss out on lot of memory related things.

Hosted/Commercial

Hybrid Analysis (Not sure for Student if they give a free base account)

app.any.run

VMRay (according to me one of the best Commercial Sandbox offering)

我已经在下面列出了其中的一些选项,根据您的预算、资源和技术技能,您可以选择自己设置或购买一些商业产品。一些商业产品可能允许免费试用帐户。

自我管理:

Cuckoo – Cuckoo 或修改后的 Cuckoo 在支持不同的操作系统平台方面表现出色。

https://drakvuf.com/ – 与 Cuckoo 不同,这是一个无代理的解决方案。设置过程虽然复杂,但结果非常出色。

Sandboxie

Noriben(不完全是沙盒,但在行为分析方面表现出色)– 一个通过 ProcMon 监控的 Python 脚本。在虚拟机中设置简单易行。然而,这不完全是沙盒,您可能会错过很多与内存相关的信息。

托管/商业

Hybrid Analysis(不确定学生是否可以获得免费基础账户)

app.any.run

VMRay(据我所知,这是最好的商业沙盒之一)

整体翻译效果还不错。

附录

如何修改ollama监听地址

1
2
ollama serve --help
OLLAMA_HOST=0.0.0.0 ollama serve

或者

1
aunchctl setenv OLLAMA_HOST "0.0.0.0"

工作流定义文件

反思翻译.yml

Dify踩坑记录(续)

Dify踩坑记录

在前文中,做了一个反思翻译,下面开始做一个长文翻译,长文翻译的思路就是,由于大模型的每次发送和接收的数据大小是有限制的,对于一些长的文章,没法一次完成,解决办法就是,把长文分段,一段一段翻译,然后再整合在一起。

这次我就一步一步讲工作流了,工作流在文章末尾,直接导入后就能看到每一个步骤了。整体的流程看起来这样。

这里着重要将的是我遇到的两个问题,以及如何解决他们的。

问题1: sandbox问题

安装三方库

在分片的节点中,我们需要用到三方的python库。

1
2
3
import tiktoken
from icecream import ic
from langchain_text_splitters import RecursiveCharacterTextSplitter

由于是sandbox镜像,事先并没有我们要的三方库。实际上sandbox项目留了这个配置。

dify/docker/volumes/sandbox/dependencies/python-requirements.txt中增加就可以。

1
2
3
tiktoken==0.6.0
langchain-text-splitters>=0.0.1
icecream==2.1.3

沙盒权限问题

如何在sandbox限制了一些系统调用,即时你安装了三方库,也用不了。这个花了好好几个小时,希望能够给你一些帮助。在github上,发现很多人都有遇到这个问题。https://github.com/langgenius/dify/issues/4993https://github.com/langgenius/dify/issues/4344,我也尝试一堆方案,比如典型的一个建议就是,类似这样,增加`security_opt`配置,然而并没有成功。假如你也遇到这样的问题,就不要再花时间了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# The DifySandbox
sandbox:
image: langgenius/dify-sandbox:0.2.8
restart: always
privileged: true
user: "root"
read_only: false
cap_add:
- SYS_ADMIN
security_opt:
- no-new-privileges:false
- "seccomp=unconfined"
environment:
# The DifySandbox configurations
# Make sure you are changing this key for your deployment with a strong key.
# You can generate a strong key using `openssl rand -base64 42`.
API_KEY: ${SANDBOX_API_KEY:-dify-sandbox}
GIN_MODE: ${SANDBOX_GIN_MODE:-release}
WORKER_TIMEOUT: ${SANDBOX_WORKER_TIMEOUT:-15}
ENABLE_NETWORK: ${SANDBOX_ENABLE_NETWORK:-true}
HTTP_PROXY: ${SANDBOX_HTTP_PROXY:-http://ssrf_proxy:3128}
HTTPS_PROXY: ${SANDBOX_HTTPS_PROXY:-http://ssrf_proxy:3128}
SANDBOX_PORT: ${SANDBOX_PORT:-8194}
volumes:
- ./volumes/sandbox/dependencies:/dependencies
healthcheck:
test: [ "CMD", "curl", "-f", "http://localhost:8194/health" ]
networks:
- ssrf_proxy_network

我的解决方案就是修改镜像,这也是在官方的faq里面的处理方法。https://github.com/langgenius/dify-sandbox/blob/main/FAQ.md#2-my-python-code-returns-an-operation-not-permitted-error。

下载dify-sandbox源代码

打开internal/static/python_syscall/syscalls_arm64.go,修改其中ALLOW_SYSCALLS

注意,faq里面写的探测需要哪些权限的步骤我并没有做成功,有知道的小伙伴请分享你的经验。由于我不知道究竟缺少什么权限,所以就直接全部写上0-500。

这是我修改后的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
var ALLOW_SYSCALLS = []int{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139,
140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
150, 151, 152, 153, 154, 155, 156, 157, 158, 159,
160, 161, 162, 163, 164, 165, 166, 167, 168, 169,
170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
180, 181, 182, 183, 184, 185, 186, 187, 188, 189,
190, 191, 192, 193, 194, 195, 196, 197, 198, 199,
200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
210, 211, 212, 213, 214, 215, 216, 217, 218, 219,
220, 221, 222, 223, 224, 225, 226, 227, 228, 229,
230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
240, 241, 242, 243, 244, 245, 246, 247, 248, 249,
250, 251, 252, 253, 254, 255, 256, 257, 258, 259,
260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
270, 271, 272, 273, 274, 275, 276, 277, 278, 279,
280, 281, 282, 283, 284, 285, 286, 287, 288, 289,
290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
300, 301, 302, 303, 304, 305, 306, 307, 308, 309,
310, 311, 312, 313, 314, 315, 316, 317, 318, 319,
320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
330, 331, 332, 333, 334, 335, 336, 337, 338, 339,
340, 341, 342, 343, 344, 345, 346, 347, 348, 349,
350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
360, 361, 362, 363, 364, 365, 366, 367, 368, 369,
370, 371, 372, 373, 374, 375, 376, 377, 378, 379,
380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
390, 391, 392, 393, 394, 395, 396, 397, 398, 399,
400, 401, 402, 403, 404, 405, 406, 407, 408, 409,
410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
420, 421, 422, 423, 424, 425, 426, 427, 428, 429,
430, 431, 432, 433, 434, 435, 436, 437, 438, 439,
440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
450, 451, 452, 453, 454, 455, 456, 457, 458, 459,
460, 461, 462, 463, 464, 465, 466, 467, 468, 469,
470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
480, 481, 482, 483, 484, 485, 486, 487, 488, 489,
490, 491, 492, 493, 494, 495, 496, 497, 498, 499,
500,
}

修改了之后,需要执行install.sh,以及./build/build_[amd64|arm64].sh完成go的构建。

1
2
./install.sh 
./build/build_arm64.sh

构建完成,会生成envmain两个可执行程序。

下一步就是就用docker重新构建镜像。

1
docker build  --progress=plain  -t mysandbox -f docker/arm64/dockerfile .

修改docker/docker-compose.yaml的sandbox的image,改成刚刚构建的镜像

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

# The DifySandbox
sandbox:
image: mysandbox
restart: always
privileged: true
cap_add:
- SYS_ADMIN
security_opt:
- no-new-privileges:false
- "seccomp=unconfined"
environment:
# The DifySandbox configurations
# Make sure you are changing this key for your deployment with a strong key.
# You can generate a strong key using `openssl rand -base64 42`.
API_KEY: ${SANDBOX_API_KEY:-dify-sandbox}
GIN_MODE: ${SANDBOX_GIN_MODE:-release}
WORKER_TIMEOUT: ${SANDBOX_WORKER_TIMEOUT:-15}
ENABLE_NETWORK: ${SANDBOX_ENABLE_NETWORK:-true}
HTTP_PROXY: ${SANDBOX_HTTP_PROXY:-http://ssrf_proxy:3128}
HTTPS_PROXY: ${SANDBOX_HTTPS_PROXY:-http://ssrf_proxy:3128}
SANDBOX_PORT: ${SANDBOX_PORT:-8194}
volumes:
- ./volumes/sandbox/dependencies:/dependencies
healthcheck:
test: [ "CMD", "curl", "-f", "http://localhost:8194/health" ]
networks:
- ssrf_proxy_network

运行新的容器,回界面测试

1
docker compose up -d

临时目录权限问题

到目前为止,权限的问题解决了,但马上就会有新的问题。错误看起来像这样

1
An error occurred: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/']

这个问题的原因是sandbox的执行的用户没有写tmp目录的权限。https://dify.ai/blog/dify-ai-blog-introducing-difysandbox,在官方博客中介绍大致是这个意思,每一个沙盒会在`/var/sandbox/`隔离。sandbox的执行用户是sandbox,我不知道是什么原因原始镜像中并没有给这个设置好权限。

于是就手工在容器中执行

1
2
cd /var/sandbox/
chmod 1777 -R *

工作池问题

现在你可以去测试工作流了,马上就会遇到新的问题。

1
Max submit count 100 of workflow thread pool reached.

为什么是100,这个问题比较容易,通过google搜索社区的内容,按照api容器就好。

https://github.com/langgenius/dify/issues/8659#issuecomment-2367260559

修改完成之后重新构建docker镜像,修改docker-compose文件,使用新的镜像,注意里面有两个容器,apiworker,都用api的镜像。

结束

到此为止,长文翻译的工作流也就算配置完成了。整个事情大概忙了我10多个小时,中间那个权限问题消耗了7个小时。很多概念都不清楚,比如什么seccomp,只是马虎的把问题趟过去了。

另外,这是一个很有用的工具,后边想想能用它来做点什么,或者你有什么好点子也欢迎一起。

附录

工作流文件

长文翻译.yml

我的第一个app

应用本身很简单,核心就是一个提醒功能。

用户可以输入处方信息,系统会根据这些信息自动提醒用户何时服用哪种药物。

技术栈:

  • react-native 0.75.2
  • firebase 20.4.0
  • @react-native-async-storage/async-storage 1.24.0
  • @react-navigation/native 6.1.18
  • nativewind
  • @notifee/react-native

一张精心设计的图片确实至关重要,能显著提升应用的整体质量。

👀 5 tips on how to make your posts exceptionally interesting to read

URL: https://www.producthunt.com/discussions/5-tips-on-how-to-make-your-posts-exceptionally-interesting-to-read

In yesterday’s discussion, Priyanka Saini asked this question. 💡 And these 5 things came to my mind:

  1. Attention-grabbing title

  2. Short text (paragraphs with 2 or 3 lines)

  3. Subtle use of emojis

  4. Incorporating visual materials (images, videos)

  5. Playing with text (rhetorical questions, inventing your own words) What else makes the post interesting?

使用macos系统的ocr

在尝试了几个在线了的ocr,以及Tesseract 的java封装tess4j 的表现之后,发现效果都不及macos的系统自带的ocr识别效果好。

于是就想有没有可能直接系统系统功能来完成。github上搜索一下,果然有人跟我有一样的想法,并且已经开源。https://github.com/straussmaximilian/ocrmac下面就写一个简单例子,识别发票

1
2
3
4
5
6
7
8
9
10
11
from ocrmac import ocrmac

file = './China-Fapiao-Invoice-System-2.jpeg'

def recognize_invoice_text(image_path):
return ocrmac.OCR(image_path, language_preference=['zh-Hans']).recognize()

annotations = recognize_invoice_text(file)

for annotation in annotations:
print(annotation[0])

识别结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
浙江增值税电矛普通发票
机器编号:661618766971
称:杭州天然气集团有限公司



訥税人识别号:
地址、电话:
开户行及账号:
货物或应税劳务、服务名称
服务费
規格型号
单位
致量
10

单价
0.94339623
发票代码:033001600211
发录号码:50843024
开柔日期:20170401
校检码:80167 52728 05105 03956
40554238005->88122-959>/636
7+14*4/-393+ +39> * + +>>/6
*8-5345*27+/<>56<0*473-/10<
>77<18<4*4/-393+<+39>-*83<<
税率

9.43
G%
0.57




价税合计(大写)
⑧壹拾凶整
称:杭州爱信诺航天信息有限公司
纳税人识别号:913301065551991560
地址、电话:杭州市西湖区万塘路30号高新东方科技园330571-81029850
开户行及账号:杭州市工行古荡支行 1202005909900032278
放款人:
复核:
¥9.43
(小写)¥10.00
¥0.57


开柔人:爱信诺
筑售方:(幸)
9E3|10836121530
发票专用章

假如你有一台macOS,那就本地运行,效果不错,还免费。

同时也测试了 https://scandocflow.com/,效果还行。假如在服务器部署,可能是一个不错的方案,每个月有50 个文档的免费配额。

https://www.newocr.com/,效果稍微差一点,但是免费配额更多。

0%