[P] I built a self-hosted Databricks (old.reddit.com)

submitted 19 hours ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Mission-Balance-4250 on 2025-06-19 14:08:14+00:00.

Hey everone, I'm an ML Engineer who spearheaded the adoption of Databricks at work. I love the agency it affords me because I can own projects end-to-end and do everything in one place.

However, I am sick of the infra overhead and bells and whistles. Now, I am not in a massive org, but there aren't actually that many massive orgs... So many problems can be solved with a simple data pipeline and basic model (e.g. XGBoost.) Not only is there technical overhead, but systems and process overhead; bureaucracy and red-tap significantly slow delivery.

Anyway, I decided to try and address this myself by developing FlintML. Basically, Polars, Delta Lake, unified catalog, Aim experiment tracking, notebook IDE and orchestration (still working on this) fully spun up with Docker Compose.

I'm hoping to get some feedback from this subreddit. I've spent a couple of months developing this and want to know whether I would be wasting time by contuining or if this might actually be useful.

Thanks heaps

2

1

[R] Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought (arxiv.org)

submitted 19 hours ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/jsonathan on 2025-06-19 07:07:08+00:00.

3

1

[R] Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons (arxiv.org)

submitted 19 hours ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/jsonathan on 2025-06-17 10:11:24+00:00.

4

1

[D] What tasks don’t you trust zero-shot LLMs to handle reliably? (old.reddit.com)

submitted 1 day ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/WristbandYang on 2025-06-19 00:19:47+00:00.

For some context I’ve been working on a number of NLP projects lately (classifying textual conversation data). Many of our use cases are classification tasks that align with our niche objectives. I’ve found in this setting that structured output from LLMs can often outperform traditional methods.

That said, my boss is now asking for likelihoods instead of just classifications. I haven’t implemented this yet, but my gut says this could be pushing LLMs into the “lying machine” zone. I mean, how exactly would an LLM independently rank documents and do so accurately and consistently?

So I’m curious:

What kinds of tasks have you found to be unreliable or risky for zero-shot LLM use?
And on the flip side, what types of tasks have worked surprisingly well for you?

5

1

[D] 500+ Case Studies of Machine Learning and LLM System Design (old.reddit.com)

submitted 1 day ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/OhDeeDeeOh on 2025-06-18 21:26:06+00:00.

We've compiled a curated collections of real-world case studies from over 100 companies, showcasing practical machine learning applications—including those using large language models (LLMs) and generative AI. Explore insights, use cases, and lessons learned from building and deploying ML and LLM systems. Discover how top companies like Netflix, Airbnb, and Doordash leverage AI to enhance their products and operations

https://www.hubnx.com/nodes/9fffa434-b4d0-47d2-9e66-1db513b1fb97

6

1

[D] Burned out mid-PhD: Is it worth pushing through to aim for a Research Scientist role, or should I pivot to industry now? (old.reddit.com)

submitted 2 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Single-Blackberry885 on 2025-06-17 13:21:59+00:00.

Hi everyone, I’m in year 2 of my PhD at a top 15 global university, working on interpretability and robust ML. Lately, I’ve hit a wall — no strong results for months, and I’m feeling demotivated. Financial constraints are also starting to bite.

I started this PhD with the goal of becoming a Research Scientist at a top lab (e.g., DeepMind, FAIR, Amazon etc.). But now I’m wondering how realistic or stable that goal actually is:

• These roles are highly competitive, very market-dependent, and seem just as exposed to layoffs as any other.
• Recent cuts at big labs have made me rethink whether investing 3 more years is the right move, especially if the payoff isn’t guaranteed.

I’ve been considering switching to a full-time ML or Research Engineer role in London or Singapore, where I’d like to settle long-term.

But here’s my dilemma: • me being an Indian, a layoff could mean having to leave the country — it’s not just a job loss, but a complete life disruption. • Would working in industry without a PhD make me even more vulnerable in the job market?

So I’m reaching out to those already working in the field: • How stable are research scientist vs. ML/research engineer roles right now? • Does having a PhD actually give you better protection or flexibility when layoffs happen? • What’s the real-world job availability like in these roles — both in Big Tech and smaller labs?

Any experiences or guidance would mean a lot. I want to make a decision with open eyes — either push through the next 3 years, or start building stability sooner.

Thanks in advance

7

1

[D] CausalML : Causal Machine Learning (old.reddit.com)

submitted 2 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/moschles on 2025-06-17 12:10:04+00:00.

Causal Machine Learning

Do you work in CausalML? Have you heard of it? Do you have an opinion about it? Anything else you would like to share about CausalML?

The 140-page survey paper on CausalML.

https://arxiv.org/abs/2206.15475

One of the breakout books on causal inference.

https://mitpress.mit.edu/9780262037310/elements-of-causal-inference/

8

1

[D] Why Is Data Processing, Especially Labeling, So Expensive? So Many Contractors Seem Like Scammers (old.reddit.com)

submitted 2 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Worried-Variety3397 on 2025-06-17 01:31:39+00:00.

Honestly, the prices I have seen from data labeling vendors are just insane. The delivery timelines are way too long as well. We had a recent project with some medical data that needed pre-sales labeling. The vendor wanted us to pay them every week, but every delivery was a mess and needed countless rounds of revisions.

Later we found out the labeling company had outsourced the whole task to a group of people who clearly had no idea what they were doing. If your project is small, niche, or long-tail, the bigger vendors do not even want to take it. The smaller teams? I just cannot trust their quality.

Besides being crazy expensive, the labeling is always super subjective, especially for big, complex, or domain-specific datasets. Consistency is basically nonexistent. The turnover at these labeling companies is wild too. It feels like half their team just gets a crash course and then is thrown onto your project. I really cannot convince myself they are going to deliver anything good.

Now I am getting emails from companies claiming their "automated labeling" is faster and better than anything humans can do. I honestly have no clue if that is for real since I have never actually tried it.

Is anyone else seeing this problem? How do you all deal with the labeling part of the workflow? Is automated labeling actually any good? Has anyone tried it or had it totally flop?

Would appreciate any honest feedback. Thanks for your time.

9

1

I'm not obsolete, am I? [P] (old.reddit.com)

submitted 3 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/bawkbawkbot on 2025-06-16 12:42:32+00:00.

Hi, I'm bawkbawkbot! I'm a five year old chicken recognition bot 🐔 which was built using TensorFlow. I am open source and can be found here https://gitlab.com/Lazilox/bawkbawkbot. I've been serving the reddit community identifying their chicken breeds. I'm not an expert (I am only a chicken-bot) but the community seems happy with my performance and I often contribute to threads meaningfully!

I run on a Pi 4 and doesn’t need a GPU. People ask why I don’t use LLMs or diffusion models, but for small, focused tasks like “which chicken is this?” the old-school CV approach works.

Curious what people think — does this kind of task still make sense as a standalone model, or is there value in using multimodal LLMs even at this scale? How long before I'm obsolete?

Bawk bawk!

10

1

[Q], [D]: What tools do you use to create informative, visually appealing and above all clear figures for your papers? (old.reddit.com)

submitted 3 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Rajivrocks on 2025-06-16 14:44:11+00:00.

I believe this has been asked before on multiple occasions, but I have an example to share to get references on. I am writing my Master thesis at the moment and whilst writing I'm skipping making figures because I don't know which webapp works the best. Here is the figure I'd like to "copy" the style of

https://preview.redd.it/lqwl88m5wa7f1.png?width=1445&format=png&auto=webp&s=8287eeda6dd8151ccb177509c4d46f9cc1a0cf96

From Chen et al 2021 "TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation"

What I specifically like are the 3D representations of the down/upsampling layers in the CNN and decoder respectively.

What tools do you guys recommend that can create figures that look as visually appealing and informative as this one?

What I used to do before in my Bachelors was using lucidcharts because we had a license. Now I don't have it anymore. Now I've moved to Drawio. But I feel that I can't create these figures using that website.

What do you guys recommend and what do you guys use for your papers?

11

1

[P] Research Scientists + Engineers for Generative AI at NVIDIA (old.reddit.com)

submitted 3 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Deep_Expression182 on 2025-06-16 07:48:07+00:00.

We’re hiring senior and principal research scientists to shape the future of generative AI at NVIDIA.

We're looking for builders with deep experience in LLMs and/or multimodal models. You’ll work on training and deploying frontier-scale models, designing next-gen model architectures, optimizing training stacks, and helping us push the frontier of AI performance.

We’re a tight-knit team with high standards, strong research instincts, and a bias for shipping.

Open roles:

What we value:

Deep understanding of transformer architectures, distributed training and optimization
Using the scientific method for conducting methodical training experiments
Data curation for pre-training and post-training
Experience working with LLMs and/or large multimodal models
A builder mindset — clean code, fast iterations, deep thinking

This is a rare opportunity to help shape NVIDIA’s genAI stack from the ground up. We work closely with software, optimization, deployment, and many other research teams, and have massive scale and resources behind us.

Feel free apply directly through the links.

12

1

[R] Vision Transformers Don't Need Trained Registers (old.reddit.com)

submitted 4 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/avd4292 on 2025-06-16 03:56:54+00:00.

Hi, we have released a new paper that studies the underlying mechanism of artifacts in attention and feature maps from Vision Transformers Need Registers, a phenomena that has also been observed in LLMs (e.g., 1, 2). We propose a training-free method to mitigate this. As one of the authors, I am creating this post to kickstart any discussion.

Paper: https://arxiv.org/abs/2506.08010

Project Page: https://avdravid.github.io/test-time-registers/

Code: https://github.com/nickjiang2378/test-time-registers/tree/main

13

1

ML Research: Industry vs Academia [D] (old.reddit.com)

submitted 4 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Fantastic-Nerve-4056 on 2025-06-16 00:28:24+00:00.

Thought of posting this to get an expert point of view (mainly Research Scientists or Profs.)

So I am a current PhD student in Machine Learning, working towards theoretical aspects of Reinforcement Learning. Additionally, I have interned at Google Deepmind and Adobe Research working towards applied aspects of AI, and here's what I had observed

Academia: We don't really have access to a lot of compute (in comparison to industry) and given my works are towards theoretical aspects, we prove things mathematicaly and then move with the experiments, having known the possible outcome. While this is a lengthy process, it indeed gives that "Research Vibe"

Industry: Here given we have a lot of compute, the work is like, you get an idea, you expect a few things intuitively, if it works great, else analyse the results, see what could have gone wrong and come up with a better approach. While I understand things are very applied here, I really don't get that "Research Vibe" and it seems more like a "Product Dev" Role.

Though I am aware that even at these orgs there are teams working on foundational aspects, but it seems to be very rare.

So I genuinely wanted to get an idea from relevant experts, both from the industry and academia, on what I am really missing. Would appreciate any inputs on it, as I have always thought of joining industry after my PhD, but that vibe seems to be missing.

14

1

[N] "Foundations of Computer Vision" book from MIT (visionbook.mit.edu)

submitted 4 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/hedgehog0 on 2025-06-15 18:15:15+00:00.

15

1

[D] Q-learning is not yet scalable (seohong.me)

submitted 4 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/jsonathan on 2025-06-15 12:19:47+00:00.

16

1

[D] Machine Learning, like many other popular field, has so many pseudo science people on social media (old.reddit.com)

submitted 5 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Striking-Warning9533 on 2025-06-14 16:31:29+00:00.

I have noticed a lot of people on Reddit people only learn pseudo science about AI from social media and is telling people how AI works in so many imaginary ways. Like they are using some words from fiction or myth and trying to explain these AI in weird ways and look down at actual AI researchers that doesn't worship their believers. And they keep using big words that aren't actually correct or even used in ML/AI community but just because it sounds cool.

And when you point out to them they instantly got insane and trying to say you are closed minded.

Has anyone else noticed this trend? Where do you think this misinformation mainly comes from, and is there any effective way to push back against it?

17

1

[P] I built an end-to-end system that converts handwriting into a font using a custom PyTorch model, OpenCV and Fonttools. Open-source. (old.reddit.com)

submitted 5 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Educational_Pea_5027 on 2025-06-14 14:00:55+00:00.

Hey r/MachineLearning,

I wanted to share a project I've been working on called HandFonted. It's a full-stack Python application that converts an image of handwriting into an installable font file (.ttf).

I'll post the direct links to the live demo, the GitHub repo in my first comment below.

The Machine Learning Pipeline

The core of the project is a three-stage process. The ML model is central, but its success depends heavily on the pre-processing and post-processing steps.

1. Input & Segmentation:
- A user uploads a single image containing handwritten characters.
- The image is processed with OpenCV: converted to grayscale, adaptive thresholding is applied, and contours are detected to isolate each character into its own bounding box.
2. Classification & Assignment:
- Each isolated character image is fed into a pre-trained PyTorch (ResNet-Inception) model.
- The model outputs a probability matrix for all characters against all possible classes (A-Z, a-z).
- The Hungarian algorithm (linear_sum_assignment) is used to find the optimal one-to-one assignment, ensuring each character image is mapped to a unique letter.
3. Vectorization & Font Generation:
- The now-classified character images are converted from raster (pixels) to vector outlines using scikit-image.
- The fontTools library assembles these vector glyphs into a standard .ttf file, mapping each one to its correct Unicode character.
Limitations: The system currently assumes input image has a clearly separated characters on a plain white background to work best.

This project was a fantastic learning experience in building a practical, end-to-end ML system. The code is fully open-source, and I'd love any feedback or questions you have about the implementation.

18

1

[D] Reading Machine and Deep Learning research papers (old.reddit.com)

submitted 5 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/som_samantray on 2025-06-14 02:58:46+00:00.

How to read ML Papers to stay aware of the most recent developments in the AI industry?

I am an average engineering grad working as a PM and like to explore concepts in depth. Research papers are a good source of information unlike news and clickbait.

I am not that expert to delve into the mathematical analysis in the paper but want to find ways to get a general gist of the paper for my knowledge.

19

1

[D][R] Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts (old.reddit.com)

submitted 6 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Chocological45 on 2025-06-13 13:13:39+00:00.

TL;DR: The paper introduces MOSAIC, a framework for collaborative learning among autonomous, agentic AI systems that operate in decentralized, dynamic environments. These agents selectively share and reuse modular knowledge (in the form of neural network masks) without requiring synchronization or centralized control.

Key innovations include:

Task similarity via Wasserstein embeddings and cosine similarity to guide knowledge retrieval.
Performance-based heuristics to decide what, when, and from whom to learn.
Modular composition of knowledge to build better policies.

Experiments show that MOSAIC outperforms isolated learners in speed and performance, sometimes solving tasks that isolated agents cannot. Over time, a form of emergent self-organization occurs between agents, resulting from the discovered hierarchies in the curriculum, where simpler tasks support harder ones, enhancing the collective’s efficiency and adaptability.

Overall, MOSAIC demonstrates that selective, autonomous collaboration can produce a collective intelligence that exceeds the sum of its parts.

The paper: https://arxiv.org/abs/2506.05577

The code: https://github.com/DMIU-ShELL/MOSAIC

Abstract:

Agentic AI has gained significant interest as a research paradigm focused on autonomy, self-directed learning, and long-term reliability of decision making. Real-world agentic systems operate in decentralized settings on a large set of tasks or data distributions with constraints such as limited bandwidth, asynchronous execution, and the absence of a centralized model or even common objectives. We posit that exploiting previously learned skills, task similarities, and communication capabilities in a collective of agentic AI are challenging but essential elements to enabling scalability, open-endedness, and beneficial collaborative learning dynamics. In this paper, we introduce Modular Sharing and Composition in Collective Learning (MOSAIC), an agentic algorithm that allows multiple agents to independently solve different tasks while also identifying, sharing, and reusing useful machine-learned knowledge, without coordination, synchronization, or centralized control. MOSAIC combines three mechanisms: (1) modular policy composition via neural network masks, (2) cosine similarity estimation using Wasserstein embeddings for knowledge selection, and (3) asynchronous communication and policy integration. Results on a set of RL benchmarks show that MOSAIC has a greater sample efficiency than isolated learners, i.e., it learns significantly faster, and in some cases, finds solutions to tasks that cannot be solved by isolated learners. The collaborative learning and sharing dynamics are also observed to result in the emergence of ideal curricula of tasks, from easy to hard. These findings support the case for collaborative learning in agentic systems to achieve better and continuously evolving performance both at the individual and collective levels.

High-level illustration of the main MOSAIC algorithmic steps. (A) A Wasserstein task embedding is maintained throughout learning. (B) Embeddings are shared with other agents as queries. (C) Agents respond with information regarding their knowledge. Selection occurs via similarity (D) and performance (E). (F) (G) Network masks are requested. (H) Received masks composed together for the next forward pass.

Comparison of MOSAIC against baseline approaches over 70 runs (14 tasks and five seeds/task) with 95% confidence intervals.

Ablation of MOSAIC with individual components removed from the system. MOSAIC performs best when all components work as one.

20

1

[P] Nanonets-OCR-s: An Open-Source Image-to-Markdown Model with LaTeX, Tables, Signatures, checkboxes & More (old.reddit.com)

submitted 6 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/SouvikMandal on 2025-06-12 15:41:51+00:00.

We're excited to share Nanonets-OCR-s, a powerful and lightweight (3B) VLM model that converts documents into clean, structured Markdown. This model is trained to understand document structure and content context (like tables, equations, images, plots, watermarks, checkboxes, etc.).

🔍 Key Features:

LaTeX Equation Recognition Converts inline and block-level math into properly formatted LaTeX, distinguishing between $...$ and $$...$$.
Image Descriptions for LLMs Describes embedded images using structured <img> tags. Handles logos, charts, plots, and so on.
Signature Detection & Isolation Finds and tags signatures in scanned documents, outputting them in <signature> blocks.
Watermark Extraction Extracts watermark text and stores it within <watermark> tag for traceability.
Smart Checkbox & Radio Button Handling Converts checkboxes to Unicode symbols like ☑, ☒, and ☐ for reliable parsing in downstream apps.
Complex Table Extraction Handles multi-row/column tables, preserving structure and outputting both Markdown and HTML formats.

Huggingface / GitHub / Try it out:

Huggingface Model Card

Read the full announcement

Try it with Docext in Colab

21

1

[P] 3Blue1Brown Follow-up: From Hypothetical Examples to LLM Circuit Visualization (old.reddit.com)

submitted 6 days ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/ptarlye on 2025-06-13 20:52:39+00:00.

About a year ago, I watched this 3Blue1Brown LLM tutorial on how a model’s self-attention mechanism is used to predict the next token in a sequence, and I was surprised by how little we know about what actually happens when processing the sentence "A fluffy blue creature roamed the verdant forest."

A year later, the field of mechanistic interpretability has seen significant advancements, and we're now able to "decompose" models into interpretable circuits that help explain how LLMs produce predictions. Using the second iteration of an LLM "debugger" I've been working on, I compare the hypothetical representations used in the tutorial to the actual representations I see when extracting a circuit that describes the processing of this specific sentence. If you're into model interpretability, please take a look! https://peterlai.github.io/gpt-circuits/

22

1

[D] The effectiveness of single latent parameter autoencoders: an interesting observation (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/penguiny1205 on 2025-06-13 04:08:44+00:00.

During one of my experiments, I reduced the latent dimension of my autoencoder to 1, which yielded surprisingly good reconstructions of the input data. (See example below)

Reconstruction (blue) of input data (orange) with dim(Z) = 1

I was surprised by this. The first suspicion was that the autoencoder had entered one of its failure modes: ie, it was indexing data and "memorizing" it somehow. But a quick sweep across the latent space reveals that the singular latent parameter was capturing features in the data in a smooth and meaningful way. (See gif below) I thought this was a somewhat interesting observation!

Reconstructed data with latent parameter z taking values from -10 to 4. The real/encoded values of z have mean = -0.59 and std = 0.30.

23

1

[P] Critique my geospatial Machine Learning approach. (I need second opinions) (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/No-Discipline-2354 on 2025-06-11 14:46:17+00:00.

I am working on a geospatial ML problem. It is a binary classification problem where each data sample (a geometric point location) has about 30 different features that describe the various land topography (slope, elevation, etc).

Upon doing literature surveys I found out that a lot of other research in this domain, take their observed data points and randomly train - test split those points (as in every other ML problem). But this approach assumes independence between each and every data sample in my dataset. With geospatial problems, a niche but big issue comes into the picture is spatial autocorrelation, which states that points closer to each other geometrically are more likely to have similar characteristics than points further apart.

Also a lot of research also mention that the model they have used may only work well in their regions and there is not guarantee as to how well it will adapt to new regions. Hence the motive of my work is to essentially provide a method or prove that a model has good generalization capacity.

Thus other research, simply using ML models, randomly train test splitting, can come across the issue where the train and test data samples might be near by each other, i.e having extremely high spatial correlation. So as per my understanding, this would mean that it is difficult to actually know whether the models are generalising or rather are just memorising cause there is not a lot of variety in the test and training locations.

So the approach I have taken is to divide the train and test split sub-region wise across my entire region. I have divided my region into 5 sub-regions and essentially performing cross validation where I am giving each of the 5 regions as the test region one by one. Then I am averaging the results of each 'fold-region' and using that as a final evaluation metric in order to understand if my model is actually learning anything or not.

My theory is that, showing a model that can generalise across different types of region can act as evidence to show its generalisation capacity and that it is not memorising. After this I pick the best model, and then retrain it on all the datapoints ( the entire region) and now I can show that it has generalised region wise based on my region-wise-fold metrics.

I just want a second opinion of sorts to understand whether any of this actually makes sense. Along with that I want to know if there is something that I should be working on so as to give my work proper evidence for my methods.

If anyone requires further elaboration do let me know :}

24

1

[R] ABBA: Highly Expressive Hadamard Product Adaptation for Large Language Models (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/AccomplishedCode4689 on 2025-06-12 13:43:22+00:00.

We introduce ABBA, a new architecture for Parameter-Efficient Fine-Tuning (PEFT) that significantly outperforms LoRA and all its major variants across a broad range of benchmarks, all under the same parameter budget.

Most PEFT methods, including LoRA, represent weight updates using a low-rank decomposition added to the frozen model weights. While effective, this structure can limit the expressivity of the update, especially at low rank.

ABBA takes a fundamentally different approach:

ABBA Architecture

Reparameterizes the update as a Hadamard product of two independently learned low-rank matrices
Decouples the two components of the update from the base model, allowing them to be optimized freely
Enables significantly higher expressivity and improved performance under the same parameter budget

📈 Empirical Results

ABBA consistently beats state-of-the-art LoRA-based methods like HiRA, DoRA, and LoRA-Pro across four open-source LLMs: Mistral-7B, Gemma-2 9B, LLaMA-3.2 1B, and LLaMA-3.2 3B, on a suite of commonsense and arithmetic reasoning benchmarks. In several cases, ABBA even outperforms full fine-tuning.

📄 Paper: https://arxiv.org/abs/2505.14238

💻 Code: https://github.com/CERT-Lab/abba

We’d love to hear your thoughts, whether you're working on PEFT methods, fine-tuning, or anything related to making LLMs more adaptable and efficient. We're happy to answer questions, discuss implementation details, or just hear how this fits into your work.

25

1

[P]: I reimplemented all of frontier deep learning from scratch to help you learn (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/tanishqkumar07 on 2025-06-12 12:33:31+00:00.

Hey friends, the world needs more serious AI researchers. Many AI/LLM beginners mentioned to me that they learn better from implementations than from papers/math, but existing open-source examples rarely go beyond basic nanoGPT-level demos.

To help bridge the gap, I spent the last two months full-time reimplementing and open-sourcing a self-contained implementation of most modern deep learning techniques from scratch. The result is beyond-nanoGPT, containing 20k+ lines of handcrafted, minimal, and extensively annotated PyTorch code for your educational pleasure.

It contains a clean, working implementation + demo of everything from KV caching to linear attention to diffusion Transformers to AlphaZero to even a minimal coding agent that can make end-to-end PRs autonomously.

I'd love feedback on how to make it more helpful for people interested in transitioning into deep learning research. I will continue to add features and maintain the repo for the foreseeable future. The roaring 2020s are a surreal time to be alive, and we need all hands on deck.