DistilQwen2.5-R1發布：知識蒸餾助推小模型深度思考

作者：蔡文睿（清素）、汪誠愚（熊兮）、嚴俊冰（玖燭）、黃俊（臨在）

引言

隨著 DeepSeek-R1 和 QwQ-32B 等麵向深度推理的大語言模型的開源，“大模型+慢思考”已成為拓展大語言模型智能邊界的標準配置。然而，這些模型在資源受限的移動設備和邊緣計算場景中的普及仍麵臨巨大挑戰。因此，學術界和工業界迫切需要解決如何有效利用知識蒸餾91视频免费观看，將這些超大規模深度推理模型的知識遷移到小模型中，從而提升計算效率並降低部署成本的問題。為此，91视频免费播放在 DistilQwen2.5 係列蒸餾小模型（看這裏）的基礎上，推出了更為強大的 DistilQwen2.5-R1 係列深度推理模型。

DistilQwen2.5-R1 係列以少量來自 DeepSeek-R1 的思維鏈蒸餾數據為基礎，通過一係列創新的蒸餾策略，有效強化了小模型的深度思考能力。實驗評估結果顯示，DistilQwen2.5-R1 係列中的多種小規模模型在各項基準測試中表現優異（見下圖）。例如，DistilQwen2.5-R1-7B 性能顯著超越了其他開源蒸餾模型，包括 OpenThinker-7B。

為方便開發者和企業在實際應用中使用 DistilQwen2.5-R1 係列模型，其所有的 Checkpoint 已在 Hugging Face 和 Model Scope 開源社區中公開。本文將深入闡述 DistilQwen2.5-R1 的蒸餾算法、性能評估，並且提供在阿裏雲星空人工智能平台 PAI 上的使用指南及相關下載教程。

DistilQwen2.5-R1中的知識蒸餾91视频免费观看

本節中，91视频免费播放主要描述 DistilQwen2.5-R1 模型訓練中使用的數據增強與知識蒸餾91视频免费观看。

由於自身參數量的顯著差異，大模型與小模型的認知與推理軌跡有時並不完全一致。以數學問題為例：對於有的數學問題，小模型由於自身參數量的限製，會傾向於使用更基礎的方法去解決問題。而大模型基於其強大的推理能力，會采用較為高階的方法。比如經典的雞兔同籠問題，小模型傾向於使用簡單枚舉法逐一試錯，而大模型會直接通過列方程的較高級方法求解。

正是由於大小模型的認知軌跡偏差，小模型有時無法有效理解大模型的思維鏈，此時如果直接該思維鏈（Chain-of-Thought，CoT）蒸餾到小模型中，往往效果不佳。為此，91视频免费播放設計了一種小型推理模型訓練框架，以消除這種認知軌跡偏差帶來的負麵影響。在後續訓練中，91视频免费播放還利用這種偏差數據進一步提升小模型的推理能力，最終推出基於該訓練框架的 DistilQwen2.5-R1 係列模型。91视频免费播放提出的訓練91视频免费观看框架包含兩個階段：CoT 數據“評價-改進-驗證”機製，以及基於不同認知軌跡數據的偏好優化算法。總體而言，DistilQwen2.5-R1 模型蒸餾的詳細算法框架如下圖所示：

給定原始的大模型思維鏈數據集，例如從 DeepSeek-R1 蒸餾的數據集，在一階段，91视频免费播放先對其進行數據難度評價，接著根據數據的難度等級對其進行相應的優化，優化之後還要對結果進行驗證。91视频免费播放使用改進且被驗證的 CoT 數據集對模型進行 SFT 訓練，獲取模型的基礎推理能力。在二階段，91视频免费播放利用一階段已有的不同難度的 CoT 數據構造偏好數據集，在一階段的基礎上進一步提升小模型的推理能力。

CoT 數據“評價-改進-驗證”機製

正如上文中提到的，大小模型間的認知推理軌跡有時存在顯著偏差。因此，對於待蒸餾的大模型思維鏈數據集，小模型無法完全理解。階段一正是基於這種認知偏差對數據集進行優化，采用了 LLM-as-a-Judge 的範式，對大模型的推理過程進行評價並改進。

給定問題、大模型的推理過程和問題的答案，91视频免费播放使用模型判斷這個推理過程是簡單、中等還是困難。難度等級的核心標準是小模型是否能夠遵循給定的推理過程得到問題的答案。以下是思維鏈的難度等級及定義：

· 中等：小模型可以遵循該推理過程得到問題的答案。

· 簡單：給定的推理過程過於簡單，缺少小模型所需的必要步驟，導致大模型依賴其強大的推理能力解決問題，而小模型無法遵循該過程得到答案。

· 困難：給定的推理過程過於複雜或過於困難，導致小模型無法遵循該過程得到答案。

基於一個大模型的問題與思維鏈集合，91视频免费播放可以將其分為簡單、中等和困難三類。對於評級為中等的部分，91视频免费播放予以保留。對於被評為簡單和困難的數據，91视频免费播放使用模型對思維鏈進行改進。具體來說：對於簡單部分，91视频免费播放擴展其推理過程，直至小模型可以遵循擴展的過程得到答案。對於評級為困難的部分，91视频免费播放精簡其推理過程，直至小模型可以遵循精簡的過程得到答案。

91视频免费播放之後對改進結果進行進一步驗證，包括：對改進後的思維鏈再次評價難度等級，檢測其是否被歸類為中等難度，以及驗證小模型是否能夠遵循改進的思維鏈解決問題。如果改進後的思維鏈通過驗證，說明改進有效，該數據可以被小模型有效理解，91视频免费播放將其保留。如果驗證不通過，說明改進無效，91视频免费播放將返回到改進步驟，重新進行改進，直至通過驗證。最終，91视频免费播放獲取了優化後的思維鏈數據集，其組成部分如下：

· 初始難度評級為中等的數據。

· 初始難度評級為簡單，經過改進擴展後評為中等並通過驗證的數據。

· 初始難度評級為困難，經過改進精簡後評為中等並通過驗證的數據。

此時，數據集內所有思維鏈的最終難度評級均為中等，意味著小模型可以有效理解數據集內的所有思維鏈，並能遵循這些思維鏈解決相應推理問題。上文提到的大小模型認知軌跡偏差問題在改進後的數據集中得到妥善解決，其可能帶來的負麵影響也被消除。91视频免费播放使用優化後的思維鏈數據集對 Qwen2.5 係列基座模型進行監督微調（SFT），得到 DistilQwen2.5-R1 係列模型的基礎結果。

基於多種認知軌跡數據的偏好優化

在第二階段，91视频免费播放基於第一階段得到的不同難度等級數據對模型進行進一步提升。

具體來說，在第一階段中，評級難度為中等的思維鏈數據是正確且適合小模型的思維鏈，小模型能夠有效理解該思維鏈並解決問題。而難度評級為簡單或困難的思維鏈數據依然是正確的思維鏈，隻是不適合小模型。在此基礎上，91视频免费播放使用模型將正確的推理過程改寫為一個錯誤的推理過程。錯誤的推理過程沒有邏輯性，且會誤導小模型，使得小模型完全無法遵循該錯誤的推理過程解決問題。

基於改寫得到的錯誤思維鏈，91视频免费播放將其與簡單、中等和困難的思維鏈進行兩兩組合，組成多種偏好數據對。這些偏好數據對中有的偏差大，有的偏差小。基於不同種類的偏好數據對及其特點，91视频免费播放分別使用針對性的參數配置，在第一階段模型的基礎上，采用 DPO 算法進一步優化小模型的推理能力。

最終，91视频免费播放利用第一階段得到的不同難度等級的認知軌跡（思維鏈）數據以及基礎模型結果，得到了 DistilQwen2.5-R1 係列模型。

DistilQwen2.5-R1 模型效果評測

在本節中，91视频免费播放從多個角度評測 DistilQwen2.5-R1 係列蒸餾小模型的實際效果；同時，91视频免费播放將 DistilQwen2.5-R1 係列模型和當前業界的前沿模型對比效果。

模型綜合能力評測

91视频免费播放在多個模型推理能力評測基準上測試了 DistilQwen2.5-R1 係列模型的能力，涵蓋數學、代碼和科學問題三個主流推理領域。

在數學領域，91视频免费播放使用 AIME2024 和 MATH-500 這兩個基準進行測試，AIME2024 是美國數學邀請賽的2024年測試集，包含30道高難度數學題，用於評估大語言模型在複雜數學推理和問題解決能力，尤其考察代數、幾何等領域的綜合應用。MATH-500 是一個數學推理能力的基準測試，包含500個測試樣本，旨在全麵考察模型在數學解題上的能力。它與 AIME2024 類似，但有其獨特的測試目標和對比結果，用於衡量模型在不同數學題目上的準確性。

在代碼領域，91视频免费播放使用 LiveCodeBench 基準，LiveCodeBench 是一個動態更新的基準測試平台，用於全麵評估大型語言模型在複雜編碼場景中的能力。它通過從頂級競賽平台收集高難度編程任務來測試模型的代碼生成、自我修複代碼執行和測試等能力，是一個綜合性、無汙染的評價基準。在本次評測中，91视频免费播放使用 LiveCodeBench 基準的V2版本，其包含2023年5月-2024年5月的511個代碼問題。

在科學問題領域，91视频免费播放使用 GPQA-Diamond（Grade-Level Problems in Question Answering Diamond）基準，其由紐約大學、CohereAI 及 Anthropic 的研究人員聯合發布，包含198條結果，是 GPQA 係列中最高質量的評測數據，用於評估模型解決專家級科學問題的能力。

如下圖所示，DistilQwen2.5-R1 係列模型在3B、7B、14B和32B四個參數量級的模型中，與原始 Qwen2.5 模型的效果進行了對比。可以看出，本文描述的小型推理模型訓練框架顯著提升了現有語言模型的推理能力，並在多個評測基準上取得了一致而明顯的效果提升。

AIME2024實驗結果對比：

MATH-500實驗結果對比：

GPQA Diamond實驗結果對比：

LiveCodeBench V2實驗結果對比：

與其他模型能力對比

為了橫向比較同期發布的不同參數規模的推理模型效果，下表分別是 DistilQwen2.5-R1 係列模型在各個參數量級上與其他前沿推理模型在上文提到的4個基準的評測結果。91视频免费播放重點對比了 DistilQwen2.5-R1 係列與 OpenThinker、DeepSeek-R1-Distill-Qwen等係列模型。

以下是7B量級的對比結果，可以看出，DistilQwen2.5-R1-7B 模型超越了 Bespoke-Stratos-7B 和 OpenThinker-7B。值得注意的是，相較於 OpenThinker-7B，DistilQwen2.5-R1-7B 在使用更少訓練數據的情況下在所有基準上達到了更高的結果。DeepSeek-R1-Distill-Qwen-7B 使用了800k閉源訓練數據，而 DistilQwen2.5-R1-7B 使用了開源數據進行訓練（OpenThoughts數據集過濾和改寫得到的子集），在基於開源數據模型領域內處於領先地位。

模型	訓練數據量	AIME2024	MATH-500	GPQA Diamond	LiveCodeBench V2
DeepSeek-R1-Distill-Qwen-7B (reported)	800k	55.5	92.8	49.1	-
Bespoke-Stratos-7B (reported)	17k	20.0	82.0	37.8	36.1
OpenThinker-7B (reported)	114k	31.3	83.0	42.4	39.9
DistilQwen2.5-R1-7B	105k	43.33	88.4	42.93	46.38

以下是32B量級的對比結果。同樣地，DistilQwen2.5-R1-32B 在所有已知基準上超越了 Sky-T1-32B-Preview，以及在絕大多數基準上超越了 OpenThinker-32B。

模型	訓練數據量	AIME2024	MATH-500	GPQA Diamond	LiveCodeBench V2
DeepSeek-R1-Distill-Qwen-32B (reported)	800k	72.6	94.3	62.1	-
Sky-T1-32B-Preview (reported)	17k	43.3	86.4	56.8	-
OpenThinker-32B (reported)	114k	66.0	90.6	61.6	68.9
DistilQwen2.5-R1-32B	105k	70.0	93.8	62.12	65.95

模型多次推理評測

91视频免费播放還測試了 DistilQwen2.5-R1 係列模型在上文提到的四個基準上多次推理的結果，模型會對同一個問題生成k個回答進行評測，即 Pass@k 指標。以下是 DistilQwen2.5-R1-7B 和 DistilQwen2.5-R1-32B 在四個基準上Pass@k結果（k=2、4、8、16、32、64）。

可以看出，隨著模型推理次數k的逐步增加，兩個模型在所有基準上的評測準確率大幅提高。值得注意的是，隨著k的增加，DistilQwen2.5-R1-7B 在 MATH-500和GPQA-Diamond 上漲幅巨大，並且不斷逼近 DistilQwen2.5-R1-32B 水準。這表明91视频免费播放的推理模型訓練框架在小模型領域內擁有巨大潛力。91视频免费播放可以通過多次推理的方式使7B模型擁有媲美32B模型的能力，極大減少了推理所需的計算資源。

模型輸出案例

對同一數學問題，91视频免费播放對比了 DistilQwen2.5-R1 係列模型在7B、32B量級和同等量級模型的推理結果。從輸出結果可以看出，DistilQwen2.5-R1 係列模型在同量級推理模型中處於領先地位。

示例一：7B量級模型對比

問題	Peter has $70 and wishes to spend an equal amount each day for one week. From Sunday through Wednesday, he spent his money on wooden action figures which cost $5 each. For the rest of the week, he will buy plastic action figures which cost $2 each. How many total action figures will he have by the end of the week?
答案	There are 7 days in a week so if he wishes to spend an equal amount of $70 each day that would be $70/7 = $<<70/7=10>>10\nFor four days (Sunday through Wednesday) he will spend 4$10 = $<<410=40>>40\nHe will spend $40 on wooden action figures of $5 each which gives $40/$5 = <<40/5=8>>8 wooden action figures\nFor the remaining 3 days he will spend 3$10 = $<<310=30>>30\nHe will spend $30 on plastic action figures of $2 each which gives $30/$2 = <<30/2=15>>15 plastic action figures\nAltogether, he will have 8+15 = <<8+15=23>>23 action figures.\n#### 23
DistilQwen2.5-R1-7B	<\|begin_of_thought\|> Okay, let's try to figure out how many action figures Peter can buy. First, I need to understand the problem step by step. So, Peter has $70 and wants to spend an equal amount each day for one week. That means seven days total. From Sunday through Wednesday, he spends money on wooden action figures that cost $5 each. Then, for the rest of the week (which would be Thursday and Friday, I guess, since the week ends on Saturday or Sunday?), he buys plastic ones that cost $2 each. The question is asking how many total action figures he'll have by the end of the week. Wait, the problem says "from Sunday through Wednesday," so that's four days. Then "for the rest of the week," which would be Thursday and Friday, making two days. So total days: 4 + 2 = 6 days? Wait, a week is seven days. If he's spending from Sunday through Wednesday (4 days), then the rest would be Thursday and Friday (2 days), but that's only 6 days. Wait, maybe the "rest of the week" includes Saturday as well? Because Sunday to Wednesday is four days, then Thursday to Saturday would be three days left. Hmm, maybe I need to clarify that. Wait, the problem says "for the rest of the week," which could be interpreted differently. Let me check again. The problem states: "From Sunday through Wednesday, he spent his money on wooden action figures..." So that's four days. Then "For the rest of the week, he will buy plastic action figures which cost $2 each." So "rest of the week" would be the remaining days of the week after Sunday through Wednesday. Since Sunday is the first day, the week would be Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday. So from Sunday to Wednesday is four days. The rest would be Thursday, Friday, Saturday. That's three days. So total days: 4 + 3 = 7 days, which is a week. Okay, so that makes sense. So he spends four days on wooden figures ($5 each) and three days on plastic ones ($2 each). The total amount he spends is $70, and he wants to spend an equal amount each day. So first, we need to figure out how much he spends each day, then calculate how many action figures he can buy each day, and sum them up. Wait, but the problem says he has $70 and wishes to spend an equal amount each day for one week. So the total amount he spends is $70, divided equally over seven days. So each day he spends $70 / 7 = $10 per day. So $10 per day total for the week. But then, from Sunday through Wednesday (four days), he spends on wooden figures ($5 each), and the remaining three days (Thursday, Friday, Saturday) he spends on plastic figures ($2 each). So the total amount he spends on wooden figures is 4 days * $10/day = $40. Then the remaining amount is $70 - $40 = $30, which he spends over three days at $2 each. So $30 / 3 days = $10 per day, which matches the total daily spending of $10. So that checks out. Now, how many wooden action figures does he buy? Each costs $5, and he spends $10 on them each day for four days. Wait, but if each day he spends $10 on wooden figures, then each day he can buy $10 / $5 = 2 action figures per day. So over four days, that's 2 * 4 = 8 wooden action figures. Then, for the plastic figures, he spends $10 per day for three days. Each plastic figure is $2, so per day he can buy $10 / $2 = 5 action figures per day. Over three days, that's 5 * 3 = 15 plastic action figures. Total action figures: 8 + 15 = 23. Wait, but let me verify this again. Let's check the total money spent. Wooden: 8 figures * $5 = $40. Plastic: 15 figures * $2 = $30. Total: $40 + $30 = $70. Which matches the total he has. So that's correct. Alternatively, another way to look at it: since he spends $10 each day, total $70 over 7 days. So each day he spends $10. For the first four days, he uses $10 each day on $5 action figures. So each day, $10 / $5 = 2 per day. 4 days: 8. Then the remaining three days, $10 each day on $2 action figures: $10 / $2 = 5 per day. 3 days: 15. Total 23. Yes, that seems to be the answer. <\|end_of_thought\|> <\|begin_of_solution\|> Peter has $70 to spend over 7 days. He spends $10 each day ($70 ÷ 7 = $10/day). Wooden action figures (Sunday–Wednesday, 4 days): Each costs $5. Daily purchase: $10 ÷ $5 = 2 figures/day. Total: 2 × 4 = 8 figures. Plastic action figures (Thursday–Saturday, 3 days): Each costs $2. Daily purchase: $10 ÷ $2 = 5 figures/day. Total: 5 × 3 = 15 figures. Total action figures: 8 (wooden) + 15 (plastic) = 23. \boxed{23} <\|end_of_solution\|>
DeepSeek-R1-Distill-Qwen-7B	<think> Okay, let's see. Peter has $70 and wants to spend an equal amount each day for one week. So first, I need to figure out how much he spends each day. A week has seven days, right? From Sunday through Wednesday, he buys wooden action figures that cost $5 each. Then, for the rest of the week, which would be Thursday and Friday, he buys plastic ones that cost $2 each. The question is asking how many total action figures he'll have by the end of the week. Wait, hold on. Wait, a week is seven days. Sunday through Wednesday is four days, and then the rest is Thursday and Friday, that's two days. So total of six days? Wait, Sunday, Monday, Tuesday, Wednesday is four days. Then Thursday and Friday, that's two days. So total of six days he's spending money, leaving Saturday out? The problem says "for one week," so maybe he starts on Sunday and goes through Saturday? But the problem mentions spending from Sunday through Wednesday, and then the rest of the week. Hmm. Wait, the problem says: "From Sunday through Wednesday, he spent his money on wooden action figures which cost $5 each. For the rest of the week, he will buy plastic action figures which cost $2 each." So Sunday to Wednesday is four days, then the rest of the week would be Thursday to Saturday, which is three days. Wait, that's three days. So total of seven days? Wait, no. Sunday to Saturday is a week. If Sunday through Wednesday is four days, then the remaining days are Thursday, Friday, Saturday, which is three days. So total of seven days. So he spends money on four days buying wooden figures and three days buying plastic ones. But the problem says he spends an equal amount each day for one week. So total spending per day is consistent. So first, I need to find out how much he spends each day. Let me denote the daily amount as x dollars. So total spending for the week would be 7x because it's one week. But he has $70. So 7x = $70. Therefore, x = 10. So he spends $10 each day. Wait, but that seems high. Let me check. If he spends $10 each day for seven days, that's $70 total. That makes sense. So each day he spends $10. Then, from Sunday through Wednesday, which is four days, he spends 4 days * $10/day = $40 on wooden action figures. Then, for the rest of the week, which is three days (Thursday, Friday, Saturday), he spends 3 days * $10/day = $30 on plastic action figures. So total $40 + $30 = $70, which matches his total money. That checks out. Now, the problem is asking for the total number of action figures he will have by the end of the week. So we need to calculate how many wooden figures he bought during the four days and how many plastic figures he bought during the three days, then sum them up. First, wooden action figures cost $5 each. He spent $40 on them. So the number of wooden figures is $40 divided by $5 per figure. That's 40 / 5 = 8 figures. Then, plastic action figures cost $2 each. He spent $30 on them. So the number of plastic figures is $30 divided by $2 per figure. That's 30 / 2 = 15 figures. Adding them together: 8 wooden + 15 plastic = 23 total action figures. Wait a second, but let me verify again to make sure I didn't make a mistake. So total days are seven. He spends $10 each day. Total $70. From Sunday to Wednesday: four days. 4 days * $10 = $40. Then Thursday, Friday, Saturday: three days * $10 = $30. Correct. Then wooden figures: $40 / $5 = 8. Plastic: $30 / $2 = 15. Total 23. That seems right. But wait, the problem says "for the rest of the week, he will buy plastic action figures which cost $2 each." So does "rest of the week" mean the remaining days after Sunday through Wednesday? Sunday through Wednesday is four days, so the remaining three days would be Thursday, Friday, Saturday. So three days. So three days of $10 each day, totaling $30. Correct. Therefore, the total number of action figures is 8 + 15 = 23. So the answer is 23. </think> Peter spends $10 each day for 7 days, totaling $70. From Sunday to Wednesday (4 days), he buys wooden action figures at $5 each: \[ \frac{4 \text{ days} \times \$10}{\$5/\text{figure}} = \frac{\$40}{\$5} = 8 \text{ figures}. \] For the remaining 3 days (Thursday to Saturday), he buys plastic action figures at $2 each: \[ \frac{3 \text{ days} \times \$10}{\$2/\text{figure}} = \frac{\$30}{\$2} = 15 \text{ figures}. \] Total action figures: \[ 8 + 15 = \boxed{23}. \]

示例二：32B量級模型對比

問題	A shoe store was having a weekend sale on a brand of popular tennis shoes. On Friday the store sold 14 pairs of tennis shoes. The next day they sold double that number of shoes. On the last day of the sale they sold one-half the amount that they did the day before, but six people returned their pairs because they didn't fit. How many pairs of tennis shoes were sold by the end of the sale?
答案	On Saturday 142=<<142=28>>28 shoes were sold\nOn Sunday 28/2=<<28/2=14>>14 shoes were sold\nBefore the returns were made 14+28+14=<<14+28+14=56>>56 pairs of shoes were sold\nAfter the returns were made 56-6=<<56-6=50>>50 pairs of shoe were sold\n#### 50
DistilQwen2.5-R1-32B 回複	<\|begin_of_thought\|> Okay, let me try to figure out this problem step by step. So, the shoe store had a sale on tennis shoes over the weekend. Let's break down the information given. First, on Friday, they sold 14 pairs of tennis shoes. That's straightforward. Then, the next day, which would be Saturday, they sold double that number. Double of 14 is 28, right? So, 28 pairs on Saturday. Then, on the last day of the sale, which is Sunday, they sold half the amount they did the day before. The day before Sunday is Saturday, so half of 28 is 14. So, 14 pairs on Sunday. But wait, there's a catch here. Six people returned their pairs because the shoes didn't fit. So, we need to subtract those 6 returned pairs from the Sunday sales. That means instead of 14, they effectively sold 14 minus 6, which is 8 pairs on Sunday. Now, to find the total number of tennis shoes sold by the end of the sale, we need to add up the sales from Friday, Saturday, and Sunday, taking into account the returns on Sunday. So, Friday: 14 pairs. Saturday: 28 pairs. Sunday: 14 - 6 = 8 pairs. Adding those together: 14 + 28 + 8. Let me calculate that. 14 plus 28 is 42, and then 42 plus 8 is 50. So, total pairs sold would be 50. Wait, but let me double-check to make sure I didn't make a mistake. Let's verify each step again. Friday: 14 pairs. Correct. Saturday is double, so 14*2=28. Correct. Sunday is half of Saturday's sales, which is 28/2=14. But then subtract the 6 returned pairs. So 14-6=8. Then total is 14+28+8=50. Yeah, that seems right. Hmm, another way to look at it: Total sold without returns would be 14 + 28 +14 = 56. Then subtract the 6 returned pairs, which would be 56 -6=50. Same result. So, that confirms it. I think that's all. So the final answer should be 50 pairs of tennis shoes sold. <\|end_of_thought\|> <\|begin_of_solution\|> The shoe store sold 14 pairs on Friday. On Saturday, they sold double that number, which is $14 \times 2 = 28$ pairs. On Sunday, they sold half of Saturday's sales, which is $28 \div 2 = 14$ pairs, but then subtract the 6 pairs returned. Therefore, the effective sales on Sunday are $14 - 6 = 8$ pairs. Adding up the sales from all three days: $14$ (Friday) + $28$ (Saturday) + $8$ (Sunday) = $14 + 28 + 8 = 50$. The final answer is $\boxed{50}$. <\|end_of_solution\|>
DeepSeek-R1-Distill-Qwen-32B 回複	<think> Okay, let's try to figure out how many tennis shoes were sold by the end of the sale. So, the problem is about a shoe store that had a weekend sale. I'll need to break it down day by day to make sure I understand each part correctly. First, on Friday, the store sold 14 pairs of tennis shoes. That seems straightforward. So, Friday sales are 14 pairs. Next, the problem says that on the next day, which would be Saturday, they sold double that number. Hmm, double of Friday's sales. So, double of 14 is 14 times 2. Let me calculate that: 14 * 2 = 28. So, they sold 28 pairs on Saturday. Now, moving on to the last day of the sale, which is Sunday. The problem states that on Sunday, they sold half the amount they did the day before. The day before Sunday is Saturday, so half of Saturday's sales. Saturday was 28 pairs, so half of that would be 28 / 2. Let me do that math: 28 divided by 2 is 14. So, they sold 14 pairs on Sunday before any returns. But wait, the problem mentions that six people returned their pairs because they didn't fit. So, we need to subtract those returns from Sunday's sales. So, Sunday's total after returns would be 14 minus 6. Let me calculate that: 14 - 6 = 8. So, they effectively sold 8 pairs on Sunday after the returns. Now, to find the total number of pairs sold by the end of the sale, I need to add up the sales from each day. That would be Friday's 14, plus Saturday's 28, plus Sunday's 8. Let me add them up step by step. First, 14 (Friday) + 28 (Saturday) = 42. Then, adding Sunday's 8 to that: 42 + 8 = 50. Wait, does that make sense? Let me double-check each step to make sure I didn't make a mistake. Friday was 14, Saturday was 28, Sunday was 14 before returns, which became 8 after subtracting 6. So, 14 + 28 is indeed 42, and 42 + 8 is 50. Yeah, that seems correct. But just to be thorough, maybe I should think about whether the returns affect the total sold or just the net sales. The problem says "they sold one-half the amount that they did the day before, but six people returned their pairs." So, I think the six returns are subtracted from Sunday's sales, so the total sold would be the initial sales minus the returns. Therefore, the calculation I did is correct. So, adding all three days together: 14 + 28 + (14 - 6) = 14 + 28 + 8 = 50. Yep, that looks right. </think> The total number of pairs sold by the end of the sale is 50. The final answer is 50.

模型下載和使用

DistilQwen2.5-R1 在阿裏雲星空人工智能平台 PAI 上的實踐

以下 HuggingFace transformers 庫為例，簡要介紹如何在 PAI-DSW 上使用 DistilQwen2.5-R1 模型。首先需要保證 PAI-DSW 鏡像內 transformers 版本大於等於4.37.0，否則會在加載模型時報錯：

KeyError: 'qwen2'

以 DistilQwen2.5-R1-7B 為例，91视频免费播放可以使用如下代碼調用模型：

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "alibaba-pai/DistilQwen2.5-R1-7B"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "xxxxx"
messages=[
    {"role": "system", "content": "Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:"},
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

DistilQwen2.5-R1在開源社區的下載

91视频免费播放在 Hugging Face 和 Model Scope 上開源了91视频免费播放蒸餾後的模型，分別為DistilQwen2.5-R1-3B、DistilQwen2.5-R1-7B、DistilQwen2.5-R1-14B、DistilQwen2.5-R1-32B。以Hugging Face為例，用戶可以使用如下代碼下載這兩個模型：

from huggingface_hub import snapshot_download

model_name = "alibaba-pai/DistilQwen2.5-R1-3B"
snapshot_download(repo_id=model_name, cache_dir="./DistilQwen2.5-R1-3B/")

model_name = "alibaba-pai/DistilQwen2.5-R1-7B"
snapshot_download(repo_id=model_name, cache_dir="./DistilQwen2.5-R1-7B/")

model_name = "alibaba-pai/DistilQwen2.5-R1-14B"
snapshot_download(repo_id=model_name, cache_dir="./DistilQwen2.5-R1-14B/")

model_name = "alibaba-pai/DistilQwen2.5-R1-32B"
snapshot_download(repo_id=model_name, cache_dir="./DistilQwen2.5-R1-32B/")

小結與未來工作

本文介紹了 DistilQwen2.5-R1 係列深度推理模型，它在少量來自 DeepSeek-R1 的思維鏈數據基礎上，通過創新蒸餾策略增強了小模型的深度思考能力。實驗結果表明，該係列模型在多個基準測試中表現出色，尤其是 DistilQwen2.5-R1-7B 的性能全麵超越了其他開源蒸餾模型。為了方便實際應用，這些模型的 Checkpoint 已在 Hugging Face 和 Model Scope 社區中公開，並提供了在阿裏雲星空人工智能平台 PAI 上的操作指南。在未來，隨著大語言模型和知識蒸餾91视频免费观看更進一步的發展，91视频免费播放將推出各種領域、各種規格的 DistilQwen 係列模型，充分促進大語言模型在實際應用中的降本增效。

參考資料

相關發表論文

1. Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang. Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud. COLING 2025

2. Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang. Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning. EMNLP 2024

91视频免费观看文章

1. DistilQwen2.5發布：通義千問蒸餾小模型再升級：http://developer.aliyun.com/article/1653842

2. DistilQwen2：通義千問大模型的知識蒸餾實踐：http://developer.aliyun.com/article/1633882

3. DistilQwen2蒸餾小模型的訓練、評測、壓縮與部署實踐：http://help.aliyun.com/zh/pai/user-guide/training-evalsuation-compression-and-deployment-of-distilqwen2

4. 大語言模型數據增強與模型蒸餾解決方案：http://help.aliyun.com/zh/pai/user-guide/llm-data-enhancement-and-model-distillation-solution

繼續閱讀：

星空人工智能91视频免费观看網倡導尊重與保護知識產權。如發現本站文章存在版權等問題，煩請30天內提供版權疑問、身份證明、版權證明、聯係方式等發郵件至1851688011@qq.com91视频免费播放將及時溝通與處理。！：首頁 > 星空人工智能產業 > AI大模型 » DistilQwen2.5-R1發布：知識蒸餾助推小模型深度思考

引言