QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD Introduction Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, mat...



Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.
Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.
As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.
-
LocalLLaMA @sh.itjust.works xodoh74984 @lemmy.world Qwen3-32b: Windows95 Starfield Screensaver Web App With Warp Drive On Click
It's amazing how far open source LLMs have come.
Qwen3-32b recreated the Windows95 Starfield screensaver as a web app with the bonus feature to enable "warp drive" on click. This was generated with reasoning disabled (/no_think) using a 4-bit quant running locally on a 4090.
Here's the result: https://codepen.io/mekelef486/pen/xbbWGpX
Model: Qwen3-32B-Q4_K_M.gguf (Unsloth quant)
Llama.cpp Server Docker Config:
undefined
docker run \ -p 8080:8080 \ -v /path/to/models:/models \ --name llama-cpp-qwen3-32b \ --gpus all \ ghcr.io/ggerganov/llama.cpp:server-cuda \ -m /models/qwen3-32b-q4_k_m.gguf \ --host 0.0.0.0 --port 8080 \ --n-gpu-layers 65 \ --ctx-size 13000 \ --temp 0.7 \ --top-p 0.8 \ --top-k 20 \ --min-p 0
System Prompt:
You are a helpful expert and aid. Communicate clearly and succinctly. Avoid emojis.
User Prompt:
Create a simple web app that uses javascript to visualize
-
LocalLLaMA @sh.itjust.works đȘCRUMBGRABBERđȘ @lemm.ee I'm using open web ui, does anybody else have a better interface?
I'm interested in really leveraging the full capabilities of local ai, for code generation and everything else. let me know what you people are using.
-
LocalLLaMA @sh.itjust.works General_Effort @lemmy.world Qwen3 officially released
qwenlm.github.io Qwen3: Think Deeper, Act Faster -
LocalLLaMA @sh.itjust.works brucethemoose @lemmy.world Qwen3 "Leaked"
huggingface.co qingy2024/Qwen3-0.6B · Hugging FaceWeâre on a journey to advance and democratize artificial intelligence through open source and open science.
Qwen3 was apparently posted early, then quickly pulled from HuggingFace and Modelscope. The large ones are MoEs, per screenshots from Reddit:
Including a 235B/22B active and a 30B/3B active.
Context appears to 'only' be 32K unfortunately: https://huggingface.co/qingy2024/Qwen3-0.6B/blob/main/config_4b.json
But its possible they're still training them to 256K:
Take it all with a grain of salt, configs could change with the official release, but it appears it is happening today.
-
LocalLLaMA @sh.itjust.works commander @lemmy.world wccftech.com DeepSeek R2 AI Model Rumors Begin to Swirl Online; Reported to Feature 97% Lower Costs Compared to GPT-4 & Fully Trained on Huawei's Ascend ChipsDeepSeek is set to drop another model pretty soon, as details about their next "DeepSeek R2" model have surfaced on the internet
-
LocalLLaMA @sh.itjust.works brucethemoose @lemmy.world Niche Model of the Day: Nemotron 49B 3bpw exl3
huggingface.co turboderp/Llama-3.3-Nemotron-Super-49B-v1-exl3 at 3.0bpwWeâre on a journey to advance and democratize artificial intelligence through open source and open science.
This is one of the "smartest" models you can fit on a 24GB GPU now, with no offloading and very little quantization loss. It feels big and insightful, like a better (albeit dry) Llama 3.3 70B with thinking, and with more STEM world knowledge than QwQ 32B, but comfortably fits thanks the new exl3 quantization!
You need to use a backend that support exl3, like (at the moment) text-gen-web-ui or (soon) TabbyAPI.
-
LocalLLaMA @sh.itjust.works hok @lemmy.dbzer0.com How do I get started with RAG (ideally with llama.cpp)?
I would like my model to know the code libraries I use and help me write code with them. I use llama.cpp's server and web UI for inference, but I have no clue how to get started with RAG, since it seems it is not natively supported with llama.cpp's server implementation. It almost looks like I would need to code my own agent.
I am not interested in commercial offerings or APIs. If you use RAG, how do you do it?
-
LocalLLaMA @sh.itjust.works absGeekNZ @lemmy.nz Less positive model
I'm currently running Gemma3, it is really good overall, but one thing that is frustrating is the relentless positivity.
It there a way to make it more critical?
I'm not looking for it to say "that is a shit" idea; but less of the "that is a great observation" or "You've made a really insightful point" etc...
If a human was talking like that, I'd be suspicious of their motives. Since it is a machine, I don't think it is trying to manipulate me, I think the programming is set too positive.
It may also be cultural, at a rule New Zealanders are less emotive in our communication, the LLM (to me) feels like are overly positive American.
-
LocalLLaMA @sh.itjust.works brucethemoose @lemmy.world Niche Model of the Day: Openbuddy 25.2q, QwQ 32B with Quantization Aware Training
huggingface.co OpenBuddy/openbuddy-qwq-32b-v25.2q-200k · Hugging FaceWeâre on a journey to advance and democratize artificial intelligence through open source and open science.
Seems there's not a lot of talk about relatively unknown finetunes these days, so I'll start posting more!
Openbuddy's been on my radar, but this one is very interesting: QwQ 32B, post-trained on openbuddy's dataset, apparently with QAT applied (though it's kinda unclear) and context-extended. Observations:
- Quantized with exllamav2, it seems to show lower distortion levels than nomal QwQ. Its works conspicuously well at 4.0bpw and 3.5bpw.
- Seems good at long context. Have not tested 200K, but it's quite excellent in the 64K range.
- Works fine in English.
- The chat template is funky. It seems to mix up the
<think>
and <|think|> tags in particular (why don't they just use ChatML?), and needs some wrangling with your own template. - Seems smart, can't say if it's better or worse than QwQ yet, other than it doesn't seem to "suffer" below 3.75bpw like QwQ does.
Also, I reposted this from /r/locallama, as I feel the community generally should going forward. With its spirit, it se
-
LocalLLaMA @sh.itjust.works ikt @aussie.zone www.tomshardware.com AI-generated videos now possible with gaming GPUs with just 6GB of VRAMBringing video diffusion to the masses.
-
LocalLLaMA @sh.itjust.works General_Effort @lemmy.world Can it play Doom? - New VLM Benchmark
www.vgbench.com VideoGameBench [Research Preview]VideoGameBench is a benchmark for video game VLM agents.
-
LocalLLaMA @sh.itjust.works ikt @aussie.zone Which model are you using?
Just thinking about making this a monthly post, which model are you using? what are the positives and negatives?
-
LocalLLaMA @sh.itjust.works ikt @aussie.zone techcrunch.com Trump administration reportedly considers a US DeepSeek ban | TechCrunchThe Trump administration is considering barring Americans from accessing AI models created by the Chinese AI lab, DeepSeek.
The Trump administration is considering new restrictions on the Chinese AI lab DeepSeek that would limit it from buying Nvidiaâs AI chips and potentially bar Americans from accessing its AI services, The New York Times reported on Wednesday.
-
LocalLLaMA @sh.itjust.works thickertoofan @lemm.ee Microsoft just released BitNet!
github.com GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMsOfficial inference framework for 1-bit LLMs. Contribute to microsoft/BitNet development by creating an account on GitHub.
Let's go! Lossless CPU inference
-
LocalLLaMA @sh.itjust.works absGeekNZ @lemmy.nz What do you use AI for?
Ok, so the best thing that I have been using my local AI for lately is acting as Socrates and helping me delve into my thinking
::: spoiler Long text here
User | 2025/04/09 08:39:23
When considering the Socratic method, what questions did Socrates ask people, to start the process of enquiry
Gemma3 (12B) | 2025/04/09 08:39:55
Okay, let's break down the types of questions Socrates used to initiate the Socratic method. It's important to understand that it wasn't a rigid formula, but a flexible approach designed to expose contradictions and ultimately lead to deeper understanding. Here's a breakdown, categorized by their function and with examples. I'll also include some key principles at the end.
1. Initial Clarification & Agreement (Establishing a Starting Point - Often Seemingly Simple)
- "What do you mean by...?" / "What is X?" (Where X is a Virtue, Concept, or Idea - e.g., "What do you mean by justice?", "What is courage?", "What is piety?")
-
LocalLLaMA @sh.itjust.works SmokeyDope @lemmy.world llama4 release discussion thread
General consensus seems to be that llama4 was a flop. A head of meta AI research division was let go.
Do you think it was a bad fp32 conversion, or just unerwhelming models all around?
2t parameters was a big increase without much gain. If throwing compute and parameters isnt working to stay competitive anymore, how do you think the next big performance gains will be made? Better CoT reasoning patterns? Omnimodal? something entirely new?
-
LocalLLaMA @sh.itjust.works Fitik @fedia.io The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation
ai.meta.com The Llama 4 herd: The beginning of a new era of natively multimodal AI innovationWeâre introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE) architecture.
Weâre introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE) architecture.