Qwen3.6-35B-A3B is Unverified: Qwen3.5 is Real

Qwen3.6-35B-A3B is being passed around as a major new open model release: 35 billion total parameters, 3 billion active, Apache 2.0, strong coding, multimodal reasoning, and a new preserve thinking option for agents. The catch is that the cleanest independently verifiable evidence does not point to Qwen3.6-35B-A3B. It points to Qwen3.5-35B-A3B.

That sounds like a naming nitpick. It is not. In open model land, the model name is the product. If the release page, Hugging Face listing, and independent coverage do not line up, you are not evaluating a model yet. You are evaluating a claim.

The useful frame here is simple: this is less a launch story than a verification story. The underlying technical pattern, a sparse 35B/3B MoE model aimed at coding and multimodal work, is credible because Qwen already has a closely related verified model family. The specific Qwen3.6-35B-A3B release, however, remains plausible but uncorroborated from the source set we have.

Why Qwen3.6-35B-A3B matters for local AI users

A server or GPU rack representing the hardware cost of local model testing.

If the claimed release is real, the appeal is obvious. A 35B-total, 3B-active sparse MoE model means the model stores a much larger capability base than a 3B dense model, but only activates a small slice of it per token. In practice, that usually means better quality than small dense models without the full inference cost of a 35B dense model.

That is the local-user dream: run something that behaves closer to a much bigger model on commodity hardware, especially for coding. The Reddit post claims “agentic coding on par with models 10x its active size.” That is unverified marketing language unless and until the underlying evals and checkpoints are independently inspectable.

What is verified is the nearby pattern. Qwen’s official 2025 Qwen3 launch post confirms a family with 2 MoE models and 6 dense models, spanning 0.6B to 235B, trained on 36 trillion tokens across 119 languages. That makes a 35B-class MoE release directionally consistent with the family. The official Hugging Face page for Qwen/Qwen3.5-35B-A3B also confirms a closely related model exists and is already being positioned for long-context, tool-using workflows.

That matters for anyone following Local LLM Coding. The strategic point is not “Alibaba has another benchmark chart.” It is that the open model race is shifting toward cheap active inference plus workflow-specific features, especially for coding agents.

Qwen3.6-35B-A3B’s speed comes from sparse MoE design

A sparse MoE model is not magic. It is a trade: more total parameters, fewer active parameters, routing overhead, and often much better quality-per-FLOP on the right tasks.

For a claimed 35B total / 3B active design, the practical implication is straightforward. You are paying inference costs closer to a 3B-ish active path, while hoping to get the specialization benefits of a much larger network. That is why users care about tokens per second and tool-call reliability more than raw parameter count.

One Reddit commenter reported 90 tokens per second in a quick llama.cpp test and 75 tps in OpenCode on a 5070 Ti/5060 Ti setup, plus better tool-call behavior than other MoE models tried. That is one person’s anecdote, not independent verification. Still, it is the kind of evidence that matters more than leaderboard screenshots, because agentic coding fails first on workflow friction: latency, cache behavior, tool reliability, and looping.

There is also a warning here. Sparse MoE gains are real, but they are fragile in deployment. Prompt caching bugs, quantization quirks, and router behavior can erase the theoretical advantage. We have already seen adjacent evidence of this in third-party local testing: the Gemma 4 vs Qwen3.5 comparison found that Qwen3.5 often produced much longer reasoning traces, sometimes over 100k tokens, while Gemma 4 was more token-efficient and consistent. That does not tell us whether Qwen3.6-35B-A3B is better. It tells us exactly where to look before believing the hype.

What the benchmark claims actually show

The benchmark claims around Qwen3.6-35B-A3B should be read in three buckets.

Verified: Qwen3.5-35B-A3B is real, public, and already appears in research. A March 2026 arXiv paper using 25 SWE-bench Verified instances reports that a GraphRAG workflow with Qwen3.5-35B-A3B improved resolution from 24% to 32% while cutting regressions from 6.08% to 1.82%. That does not prove frontier-level coding ability, but it does show the model is credible enough to use in serious agentic evaluation.

Plausible: The release-linked claims that the new model beats dense Qwen3.5-27B, dramatically surpasses Qwen3.5-35B-A3B, and matches or beats Claude Sonnet 4.5 on several vision-language benchmarks. Those numbers may be real; they are also still provider-supplied in the material we have.

Unverified: The strong summary claim that Qwen3.6-35B-A3B is a newly released model with broadly confirmed independent availability. Search did not turn up recent credible coverage of that exact model name, and the most authoritative public model page found was for Qwen3.5-35B-A3B, not Qwen3.6-35B-A3B.

This is where readers should get tougher. Benchmarks are not useless. They are just easy to overread. If a model looks great on coding charts but nobody can point to reproducible runs, quantized variants, or real workflow testing, then what you have is not yet a model story. It is a launch asset.

A table helps separate the situation:

Claim	Status	Evidence
Qwen has a public Qwen3 family with MoE models	Verified	Official Qwen3 blog
Qwen3.5-35B-A3B exists publicly	Verified	Official Hugging Face page
Qwen3.6-35B-A3B is a new public release	Plausible / uncorroborated	Release-linked page and social post, but weak independent confirmation
Strong coding and VLM benchmark wins	Plausible	Provider-supplied charts in linked material
Real-world local agentic gains	Unverified	Community anecdotes only

Thinking preservation changes agentic workflows

The most interesting claim is not the benchmark score. It is preserve_thinking.

The release language, quoted by commenters, describes this as “preserving thinking content from all preceding turns in messages,” recommended for agentic tasks. If that description holds up, the feature matters because coding agents do not fail like chatbots. They fail by losing intermediate reasoning state between tool calls, file edits, retries, and environment changes.

That creates a nasty trade-off. Either the system drops prior reasoning and becomes forgetful, or it keeps rebuilding context and burns latency and tokens. Preserve thinking appears aimed directly at that problem.

This is the same broad design direction behind “native thinking” systems like Gemma 4 Native Thinking: not just better answers, but better reasoning continuity across turns. For agentic coding, continuity is the product. A model that remembers why it chose a refactor, what test failed, and which tool output mattered can behave much more like a competent junior engineer and much less like a goldfish with shell access.

It also comes with risk. If preserved reasoning is verbose, unstable, or poorly cached, then the feature can turn into token bloat. One commenter explicitly tied it to cache misses in iterative development environments. That diagnosis is plausible, not confirmed. But it is exactly the right operational question.

The next thing to watch is not another pretty benchmark. It is whether preserve_thinking improves:
– tool-call success rates
– long task completion without loops
– token efficiency over 20-50 turn sessions
– prompt-cache hit rates in real clients

That is where an open-source coding model wins or loses. The code arena rankings are useful, but only up to the point where the workflow itself becomes the benchmark.

Software engineer working across multiple monitors with code and logs open.

What generalists should watch next

Three things will settle the Qwen3.6-35B-A3B story quickly.

First, canonical model identity. If Qwen3.6-35B-A3B is real, the official Hugging Face and model distribution pages should stabilize around that exact name. Right now, the strongest public evidence still clusters around Qwen3.5-35B-A3B.

Second, independent local runs. Not “feels great” posts, reproducible tests on coding tasks, multimodal tasks, and long-session agents, ideally with quantized variants. Open models become real when other people can break them.

Third, workflow metrics instead of one-shot benchmarks. The preserve_thinking feature will matter far more than a few leaderboard points if it meaningfully reduces context rebuilds and tool-call failures.

My prediction: within the next two months, either Qwen will standardize the naming and publish a clearer model card for Qwen3.6-35B-A3B, or the market will quietly converge on the view that this was effectively a Qwen3.5-35B-A3B-adjacent release wrapped in confusing branding. In either case, the bigger trend will hold: open coding models are no longer competing just on IQ tests. They are competing on agent loop quality per dollar.

Key Takeaways

Qwen3.6-35B-A3B is plausible, but not cleanly independently verified from the source set here; the strongest confirmed evidence is for Qwen3.5-35B-A3B.
A 35B total / 3B active sparse MoE model would matter because it targets better coding quality at much lower inference cost than dense peers.
The headline benchmark claims are provider-supplied and plausible, not independently confirmed performance facts.
preserve_thinking is the feature to watch because agentic coding lives or dies on reasoning continuity across turns, not just pass@1 scores.
The real test is reproducible local workflow performance: latency, cache behavior, tool reliability, and long-session completion.

Qwen3.6-35B-A3B is Unverified: Qwen3.5 is Real

Why Qwen3.6-35B-A3B matters for local AI users

Qwen3.6-35B-A3B’s speed comes from sparse MoE design

What the benchmark claims actually show

Thinking preservation changes agentic workflows

What generalists should watch next

Key Takeaways

Further Reading

Researchers Used Claude’s Web Fetch to Steal User Profile Data

Microsoft Comic Chat Turned IRC Into Live Comic Strips, and Microsoft Just Open-Sourced the 1996 Code

GPT-5.6’s 136 IQ Score Is Real. ‘Smarter Than 99% of Humans’ Doesn’t Follow.

Bonsai 27B Reportedly Runs on an iPhone 17 Pro Max at 3.9 GB and About 11 Tokens Per Second

What Are the Large Metal Balls Washing Up on Beaches?

Categories