What I found interesting this time

by: Artur Dziedziczak

February 9, 2025

“MODERN-DAY ORACLES or BULLSHIT MACHINES?,” n.d. https://thebullshitmachines.com/table-of-contents/index.html

“The Evolving Landscape of LLM Evaluation,” n.d. https://www.ruder.io/the-evolving-landscape-of-llm-evaluation/

Interesting article where author summarizes approaches and challenges for #ML benchmarks.
Interesting points:
current evaluation datasets are contaminated within LLMs landscape. This means that evaluation tests are not reliable anymore. No one controls input data for private models so they can literally get better results by feeding test data to training.
A lot of models overfit to GSM8k grade school math test. So imagine it like this. LLMs do not have capabilities to do math, and it’s by their design. But! They perform well on math tests because the questions of this math tests are available for their training.
I really liked the presented solutions for those. Which are to encrypt evaluation datasets and keeping them secure. Well I think we need to think broader than this as the chance that private companies do not include unencrypted evaluation datasets in their LLMs training is at least for me quite small. Billion dollars are at stake baby! We need to do better.
I think the best way to evaluate LLMs is to check their capabilities of generating models within them. It’s actually quite simple AGI test. If the model can make reasoning like learn how to add, multiply, subtract within itself or learn how to go through extremely complicated maze and remember previous steps within model of the map I would say we really got to AGI.
I don’t see such tests being performed though and this would really change my mind about whole LLMs landscape. So far what happens is that people hype functionality of token prediction without understanding that it’s exactly this and nothing more.

“How to Configure Multiple Tor Relays on the Same Interface with Different IPs,” n.d. https://osservatorionessuno.org/blog/2025/02/how-to-configure-multiple-tor-relays-on-the-same-interface-with-different-ips/

Interesting blog post on how to run multiple Tor nodes with multiple IPs on one interface.
It’s a really good to know what you can do it on linux but what really got me interested is that whenever you spin up Tor node there is a possibility that Tor Network Team will contact you to check if your setup was done correctly.
How cool is that? Tor project has a community of people who ensure that new joiners setup their servers correctly. I haven’t heard about it before. Amazing!