

Your article is pay walled, so we cannot respond to what was written.
Having said that… I’m reasonably sure Europe isn’t hand wringing over US staying in NATO.
I’d imagine you’d be hard pressed to find popular support for America in Europe…anywhere.


Your article is pay walled, so we cannot respond to what was written.
Having said that… I’m reasonably sure Europe isn’t hand wringing over US staying in NATO.
I’d imagine you’d be hard pressed to find popular support for America in Europe…anywhere.


Yes. And despite that, one in three Australian homes now has rooftop solar.
Renewables supplied over half the national grid in Q4 2025, with roughly 7 GW of new capacity added that year alone. Nearly 200,000 home batteries were installed in the second half of 2025.
One in three new vehicles sold now has some form of electrification, with hybrids leading the shift and petrol sales dropping 10% last year.
Even heavy industry is moving. Australia already operates the world’s largest fully driverless freight rail network - Rio Tinto’s AutoHaul runs 1,700km of heavy-haul trains across the Pilbara, controlled remotely from Perth, straight from the mine to the deep-water port at Cape Lambert.
Battery-electric locomotives are now in trial on those same lines. Electrification is happening at every scale here - rooftop, road, and rail - often despite the politics, not because of it.


Some of the downstream processing infrastructure is already there, we just needed one more push. Hopefully this is it.
Time to stop exporting the raw goods (coal, steel, gas, hydrogen, lithium etc) offshore and then buying it back. Time to actually process it here and use it.
I’d like to think Trump is actually doing the world a favour by showing us what a fair weather friend America really is. His doctrine of America first may force the rest of the world to stop depending on America entirely.


In capital cities, it’s…reasonable. Takes too long to get from A to B, but you can do it, usually.
In regional areas, generally not great.
Australia is heavily car centric for the most part.
Don’t. Chances of finding a pre 2021/23 model are slim, and if you happen to connect it to the net carelessly, it will auto bork itself.
Get a raspberry pi
If you mean the GCCWTV, google patched that out in 2023. You can still ADB into them just fine to ADB push or pull apps or files (after you install adbtools), but you cannot install CFW (like lineageOS) any more.
I think probably the best way to get something like lineageOS on your TV is to get a raspberry pi (or a CM4 module + HDMI board) and flash that. You should be able to pair your preexisting Chromecast remote with it and the interface should be seemless.


^ I can confirm this. It’s why I refuse to interact on Reddit anymore. Fuck em.


I’m building one. It treats you how you treat it, by classifying tone and content and responding in kind, adapting on the fly / with decay curves. I do it using a local classifier swarm (7 micro, sub LLMs) and a decision tree.
You can set it to IDGAF mode by default. Still useful…just zero fucks given.


(Ignore the “Powered by OpenAi” bit. That because GPTmobile thinks anything using OpenAI shaped endpoint is an OAi model)
https://bobbyllm.github.io/llama-conductor/blog/claude-in-a-can-1/


“…specifically crafted to demonstrate tasks that humans complete easily”
Motherfucker, I can’t work out Minesweeper. I got zero fucking chance with your mystery box bloop game.


Point 1 - no. LLM outputs are not always hallucinations (generally speaking - some are worse than others) but where they might veer off into fantasy, I’ve reinforced with programming. Think of it like giving your 8yr old a calculator instead of expecting them to work out 7532x565 in their head. And a dictionary. And encyclopedia. And Cliff’s notes. And watch. And compass. And a … you get the idea.
The role of the footer is to show you which tool it used (its own internal priors, what you taught it, calculator etc) and what ratio the answer is based on those. Those are router assigned. That’s just one part of it though.
Point 2 is a mis-read. These aren’t instructions or system prompts telling the model “don’t make things up” - that works about as well as telling a fat kid not to eat cake.
Instead, what happens is the deterministic elements fire first. The model gets the answer, which the model then builds context on. It funnels it in the right direction and the llm tends to stay in that lane. That’s not guardrails on AI, that’s just not using AI where AI is the wrong tool. Whether that’s “real AI” is a philosophy question - what I do know and can prove is that it leads to far fewer wrong answers.
EDIT: I got my threads mixed. Still same point but for context, see - https://lemmy.world/post/44805995


Thank you!
Done.
Also, go Team Codeberg.


Corporations are people too! /s


I don’t understand the M$ endgame with Win 11.
Like, it would be very easy to paint recent gaffs as intentional… but as Halon’s sez “never attribute to malice that which is adequately explained by stupidity”.
Putting aside low hanging fruit (end stage capitalism, ai bad etc)…y u do this, Microsoft? You have good people there, right? Top. Men. Right?
I’d love to read something on this topic from a M$ insider / ex-pat. I’m trying to understanding why M$ is doing the equivalent of Sideshow Bob stepping on garden rakes.
What’s up over there?


AI already trains on Wikipedia.


Well…no. But also yes :)
Mostly, what I’ve shown is if you hold a gun to its head (“argue from ONLY these facts or I shoot”) certain classes of LLMs (like the Qwen 3 series I tested; I’m going to try IBM’s Granite next) are actually pretty good at NOT hallucinating, so long as 1) you keep the context small (probably 16K or less? Someone please buy me a better pc) and 2) you have strict guard-rails. And - as a bonus - I think (no evidence; gut feel) it has to do with how well the model does on strict tool calling benchmarks. Further, I think abliteration makes that even better. Let me find out.
If any of that’s true (big IF), then we can reasonably quickly figure out (by proxy) which LLM’s are going to be less bullshitty when properly shackled, in every day use. For reference, Qwen 3 and IBM Granite (both of which have abliterated version IIRC - that is, safety refusals removed) are known to score highly on tool calling. 4 swallows don’t make spring but if someone with better gear wants to follow that path, then at least I can give some prelim data from the potato frontier.
I’ll keep squeezing the stone until blood pours out. Stubbornness opens a lot of doors. I refuse to be told this is an intractable problem; at least until I try to solve it myself.


Firstly, thanks for this paper. I read it this afternoon.
Secondly, well, shit. I’m beavering away at a paper in what little spare time I have, looking at hallucination suppression in local LLM. I’ve been testing both the abliterated and base version of Qwen3-4B 2507 instruct, as they represent an excellent edge device llm per all benchmarks (also, because I am a GPU peasant and only have 4GB vram). I’ve come at it from a different angle but in the testing I’ve done (3500 runs; plus another 210 runs on a separate clinical test battery), it seems that model family + ctx size dominate hallucination risk. Yes, a real “science discovers water makes things wet; news at 11” moment.
Eg: Qwen3-4B Hivemind ablation shows strong hallucination suppression (1.4% → 0.2% over 1000 runs) when context grounded. But it comes with a measured tradeoff: contradiction handling suffers under the constraints (detection metrics 2.00 → 0.00). When I ported the same routing policy to base Qwen 3-4B 2507 instruct, the gains flipped. No improvement, and format retries spiked to 24.9%. Still validating these numbers across conditions; still trying to figure out the why.
For context, I tested:
Reversal: Does the model change its mind when you flip the facts around? Or does it just stick with what it said the first time?
Theory of Mind (ToM): Can it keep straight who knows what? Like, “Alice doesn’t know this fact, but Bob does” - does it collapse those into one blended answer or keep them separate?
Evidence: Does it tag claims correctly (verified from the docs, supported by inference, just asserted)? And does it avoid upgrading vague stuff into false confidence?
Retraction: When you give it new information that invalidates an earlier answer, does it actually incorporate that or just keep repeating the old thing?
Contradiction: When sources disagree, does it notice? Can it pick which source to trust? And does it admit uncertainty instead of just picking one and running with it?
Negative Control: When there’s not enough information to answer, does it actually refuse instead of making shit up?
Using this as the source doc -
https://tinyurl.com/GuardianMuskArticle
FWIW, all the raw data, scores, and reports are here: https://codeberg.org/BobbyLLM/llama-conductor/src/branch/main/prepub
The Arxiv paper confirms what I’m seeing in the weeds: grounding and fabrication resistance are decoupled. You can be good at finding facts and still make shit up about facts that don’t exist. And Jesus, the gap between best and worst model at 32K is 70 percentage points? Temperature tuning? Maybe 2-3 pp gain. I know which lever I would be pulling (hint: pick a good LLM!),
For clinical deployment under human review (which is my interest), I can make the case that trading contradiction flexibility for refusal safety is ok - it assumes the human in the middle reads the output and catches the edge cases.
But if you’re expecting one policy to work across all models, automagically, you’re gonna have a bad time.
TL;DR: once you control for model family, I think context length is going to turn out the be the main degradation driver; my gut feeling based on the raw data here is that the useful window for local 4B is tighter ~16K. Above that hallucination starts to creep in, grounding or not. It would be neat if it was a simple 4x relationship (4b–>16K; 8b–>32K) but things tend not to work out that nicely IRL.
PS: I think (no evidence yet) that ablit and non abilt might need different grounding policies for different classes of questions. That’s interesting too - it might mean we can route between deterministic grounding and not differently, depending on ablation, to get the absolute best hallucination suppression. I need to think more on it.
PPS: I figured out what caused the 24.9% retry spike - my stupid fat fingers when coding. I amended the code and it’s now sitting at 0%. What’s more, early trends are showing 0.00% hallucinations across testing (I’m about 700 repeats in). I’m going to run a smaller re-test battery (1400 or so) across both Qwen3-4B 2507 models to achieve minimal statistical valid difference. If THAT holds, I will then test on Granite Micro 3B, Phi-4B-mini and Small-llm 3B tomorrow. I think that will give me approx 8000 data points.
If this shows what I hope it shows, then maybe, just maybe … no, let’s not jinx it. I’ll put the data out there and someone else can run confirmation.


I agree with you. More to the point…why accept code from anyone (clanker or meatbag) without provenance?
If I don’t know you, and you can’t explain what it does? Straight into the garbage it goes.
The issue isn’t AI contamination. It’s accepting code from any source without provenance and accountable review.


Possible. I do hope they take the more principled approach of solving the global problem for that class of question (I tried to) rather than cheating on the local maxima. That’s the actual useful lever to pull.
You want generalisability, not parroting.
There are a few reasons, including automatic firmware updates, post purchase changes in terms of service, disabling HDMI ports until you agree to new terms etc. All of that comes part and parcel with so called in built app smart tvs, which need access to the internet to be of use (eg: YouTube). Once that’s enabled…they work in the background to update self (yes, even when disabled, at least by basic means). Without it, the apps are limited utility - catch 22. See - Roku TVs, some TCLs, Sharps, FireTvs, Samsungs Blauerpunkts etc.
OTOH
There are devices (like older google chrome cast with TV - the ones that look like a oversized nurses watch) that sit behind your TV and can be solely powered by the TV.
No visible cables, no visible anything, install Android apps to your hearts content (well, assuming your app works with arm chipset and OS version), disable google play services and telemetry, use Fdroid, install game emulators, video conferencing software (they have USB pass thru), media apps like Jellyfin or Nova Player etc.
They don’t make those particular Chromecasts any more (newer model is basically same form factor as NVIDIA shield), but there were and probably still are similar “plug into TV and forget it” sticks, like CM4 in HDMI enclosure.
TL;DR: I’m for having stuff perinstalled too…but not if manufacturer can change how it works after point of sale with silent or mandatory firmware push. If that’s the play, I’d rather roll my own. YMMV.