LocalLLaMA

Skip Navigation

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Members

2,921

Posts

313

Active Today

25

Created

2 yr. ago

Sort

LocalLLaMA @sh.itjust.works
xodoh74984 @lemmy.world
8h ago

Qwen3-32b: Windows95 Starfield Screensaver Web App With Warp Drive On Click
It's amazing how far open source LLMs have come.
Qwen3-32b recreated the Windows95 Starfield screensaver as a web app with the bonus feature to enable "warp drive" on click. This was generated with reasoning disabled (/no_think) using a 4-bit quant running locally on a 4090.
Here's the result: https://codepen.io/mekelef486/pen/xbbWGpX
Model: Qwen3-32B-Q4_K_M.gguf (Unsloth quant)
Llama.cpp Server Docker Config:
undefined

docker run \ -p 8080:8080 \ -v /path/to/models:/models \ --name llama-cpp-qwen3-32b \ --gpus all \ ghcr.io/ggerganov/llama.cpp:server-cuda \ -m /models/qwen3-32b-q4_k_m.gguf \ --host 0.0.0.0 --port 8080 \ --n-gpu-layers 65 \ --ctx-size 13000 \ --temp 0.7 \ --top-p 0.8 \ --top-k 20 \ --min-p 0
System Prompt:
You are a helpful expert and aid. Communicate clearly and succinctly. Avoid emojis.
User Prompt:
Create a simple web app that uses javascript to visualize
5
LocalLLaMA @sh.itjust.works
🍪CRUMBGRABBER🍪 @lemm.ee
2h ago

I'm using open web ui, does anybody else have a better interface?

I'm interested in really leveraging the full capabilities of local ai, for code generation and everything else. let me know what you people are using.

1
LocalLLaMA @sh.itjust.works
ikt @aussie.zone
3d ago

Technically correct

7
LocalLLaMA @sh.itjust.works
General_Effort @lemmy.world
4d ago

Qwen3 officially released

qwenlm.github.io Qwen3: Think Deeper, Act Faster
QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD Introduction Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, mat...

https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f
https://github.com/QwenLM/Qwen3
https://modelscope.cn/collections/Qwen3-9743180bdc6b48
https://discord.gg/yPEP2vHTu4
https://www.kaggle.com/models/qwen-lm/qwen-3

18
LocalLLaMA @sh.itjust.works
brucethemoose @lemmy.world
4d ago

Qwen3 "Leaked"

huggingface.co qingy2024/Qwen3-0.6B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Qwen3 was apparently posted early, then quickly pulled from HuggingFace and Modelscope. The large ones are MoEs, per screenshots from Reddit:
screenshots
Including a 235B/22B active and a 30B/3B active.
Context appears to 'only' be 32K unfortunately: https://huggingface.co/qingy2024/Qwen3-0.6B/blob/main/config_4b.json
But its possible they're still training them to 256K:
from reddit
Take it all with a grain of salt, configs could change with the official release, but it appears it is happening today.

1
LocalLLaMA @sh.itjust.works
commander @lemmy.world
6d ago

wccftech.com DeepSeek R2 AI Model Rumors Begin to Swirl Online; Reported to Feature 97% Lower Costs Compared to GPT-4 & Fully Trained on Huawei's Ascend Chips
DeepSeek is set to drop another model pretty soon, as details about their next "DeepSeek R2" model have surfaced on the internet

3
LocalLLaMA @sh.itjust.works
brucethemoose @lemmy.world
6d ago

Niche Model of the Day: Nemotron 49B 3bpw exl3

huggingface.co turboderp/Llama-3.3-Nemotron-Super-49B-v1-exl3 at 3.0bpw
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

This is one of the "smartest" models you can fit on a 24GB GPU now, with no offloading and very little quantization loss. It feels big and insightful, like a better (albeit dry) Llama 3.3 70B with thinking, and with more STEM world knowledge than QwQ 32B, but comfortably fits thanks the new exl3 quantization!

You need to use a backend that support exl3, like (at the moment) text-gen-web-ui or (soon) TabbyAPI.

9
LocalLLaMA @sh.itjust.works
hok @lemmy.dbzer0.com
1w ago

How do I get started with RAG (ideally with llama.cpp)?

I would like my model to know the code libraries I use and help me write code with them. I use llama.cpp's server and web UI for inference, but I have no clue how to get started with RAG, since it seems it is not natively supported with llama.cpp's server implementation. It almost looks like I would need to code my own agent.
I am not interested in commercial offerings or APIs. If you use RAG, how do you do it?

1
LocalLLaMA @sh.itjust.works
absGeekNZ @lemmy.nz
2w ago

Less positive model

I'm currently running Gemma3, it is really good overall, but one thing that is frustrating is the relentless positivity.
It there a way to make it more critical?
I'm not looking for it to say "that is a shit" idea; but less of the "that is a great observation" or "You've made a really insightful point" etc...
If a human was talking like that, I'd be suspicious of their motives. Since it is a machine, I don't think it is trying to manipulate me, I think the programming is set too positive.
It may also be cultural, at a rule New Zealanders are less emotive in our communication, the LLM (to me) feels like are overly positive American.

9
LocalLLaMA @sh.itjust.works
brucethemoose @lemmy.world
2w ago

Niche Model of the Day: Openbuddy 25.2q, QwQ 32B with Quantization Aware Training

huggingface.co OpenBuddy/openbuddy-qwq-32b-v25.2q-200k · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Seems there's not a lot of talk about relatively unknown finetunes these days, so I'll start posting more!
Openbuddy's been on my radar, but this one is very interesting: QwQ 32B, post-trained on openbuddy's dataset, apparently with QAT applied (though it's kinda unclear) and context-extended. Observations:
- Quantized with exllamav2, it seems to show lower distortion levels than nomal QwQ. Its works conspicuously well at 4.0bpw and 3.5bpw.
- Seems good at long context. Have not tested 200K, but it's quite excellent in the 64K range.
- Works fine in English.
- The chat template is funky. It seems to mix up the
  <think>
  and <|think|> tags in particular (why don't they just use ChatML?), and needs some wrangling with your own template.
- Seems smart, can't say if it's better or worse than QwQ yet, other than it doesn't seem to "suffer" below 3.75bpw like QwQ does.
Also, I reposted this from /r/locallama, as I feel the community generally should going forward. With its spirit, it se
0
LocalLLaMA @sh.itjust.works
ikt @aussie.zone
2w ago

www.tomshardware.com AI-generated videos now possible with gaming GPUs with just 6GB of VRAM
Bringing video diffusion to the masses.

5
LocalLLaMA @sh.itjust.works
General_Effort @lemmy.world
2w ago

Can it play Doom? - New VLM Benchmark

www.vgbench.com VideoGameBench [Research Preview]
VideoGameBench is a benchmark for video game VLM agents.

0
LocalLLaMA @sh.itjust.works
ikt @aussie.zone
2w ago
April 2025

Which model are you using?

Just thinking about making this a monthly post, which model are you using? what are the positives and negatives?

13
LocalLLaMA @sh.itjust.works
ThorrJo @lemmy.sdf.org
2w ago

www.tomshardware.com Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs
This is a lightweight as it gets.

10
LocalLLaMA @sh.itjust.works
ikt @aussie.zone
2w ago

techcrunch.com Trump administration reportedly considers a US DeepSeek ban | TechCrunch
The Trump administration is considering barring Americans from accessing AI models created by the Chinese AI lab, DeepSeek.

The Trump administration is considering new restrictions on the Chinese AI lab DeepSeek that would limit it from buying Nvidia’s AI chips and potentially bar Americans from accessing its AI services, The New York Times reported on Wednesday.

33
LocalLLaMA @sh.itjust.works
cm0002 @lemmy.world
2w ago

Should be able to load the full version of DeepSeek R1 on this no prob 😎😎

9
LocalLLaMA @sh.itjust.works
thickertoofan @lemm.ee
2w ago

Microsoft just released BitNet!

github.com GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs
Official inference framework for 1-bit LLMs. Contribute to microsoft/BitNet development by creating an account on GitHub.

Let's go! Lossless CPU inference

6
LocalLLaMA @sh.itjust.works
absGeekNZ @lemmy.nz
3w ago

What do you use AI for?
Ok, so the best thing that I have been using my local AI for lately is acting as Socrates and helping me delve into my thinking
::: spoiler Long text here
User | 2025/04/09 08:39:23
When considering the Socratic method, what questions did Socrates ask people, to start the process of enquiry
Gemma3 (12B) | 2025/04/09 08:39:55
Okay, let's break down the types of questions Socrates used to initiate the Socratic method. It's important to understand that it wasn't a rigid formula, but a flexible approach designed to expose contradictions and ultimately lead to deeper understanding. Here's a breakdown, categorized by their function and with examples. I'll also include some key principles at the end.
1. Initial Clarification & Agreement (Establishing a Starting Point - Often Seemingly Simple)
- "What do you mean by...?" / "What is X?" (Where X is a Virtue, Concept, or Idea - e.g., "What do you mean by justice?", "What is courage?", "What is piety?")
34
LocalLLaMA @sh.itjust.works
SmokeyDope @lemmy.world
4w ago

llama4 release discussion thread

General consensus seems to be that llama4 was a flop. A head of meta AI research division was let go.
Do you think it was a bad fp32 conversion, or just unerwhelming models all around?
2t parameters was a big increase without much gain. If throwing compute and parameters isnt working to stay competitive anymore, how do you think the next big performance gains will be made? Better CoT reasoning patterns? Omnimodal? something entirely new?

7
LocalLLaMA @sh.itjust.works
Fitik @fedia.io
4w ago
Meta Blog

The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation

ai.meta.com The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation
We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE) architecture.

We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE) architecture.

1

0 active users