1

5

dcdaML - devanagari character detection dataset training framework (github.com)

submitted 2 months ago by thickertoofan@lemm.ee to c/machinelearning@lemmy.ml

5 comments fedilink

cross-posted from: https://lemm.ee/post/61282397

Open sourcing this project I made in just a weekend, planning to continue this in my free time, with synthetic data gen and some more modifications, anyone is welcome to chip in, I'm not an expert in ML. The inference is live here using tensorflow.js. The model is just 1.92 Megabytes!

2

-6

Breaking GPT-5 News! (lemmy.world)

submitted 2 months ago by fubarx@lemmy.world to c/machinelearning@lemmy.ml

2 comments fedilink

cross-posted from: https://lemmy.world/post/27657674

3

I want to open source a dataset but I'm not sure what license to use (lemmy.world)

submitted 2 months ago* (last edited 2 months ago) by 4Robato@lemmy.world to c/machinelearning@lemmy.ml

6 comments fedilink

Hello!

I did a map generator(it's pixel art and the largest are 300x200 pixels) some time ago and decided to generate 3 types of map sizes and 1500 maps for each size to train a model to practice and I thought to do that dataset open source.

Is that really something that people want/appreciate or not really? I'm a bit lost on how to proceed and what license to use. Does it make sense to use an MIT License? Or which one do you recommend?

thanks!

4

8

MLOps tips I gathered recently (www.readyforagents.com)

submitted 3 months ago by oba@lemmy.world to c/machinelearning@lemmy.ml

0 comments fedilink

Hi all,

I've been experimenting with building and deploying ML and LLM projects for a while now, and honestly, it’s been a journey.

Training the models always felt more straightforward, but deploying them smoothly into production turned out to be a whole new beast.

I had a really good conversation with Dean Pleban (CEO @ DAGsHub), who shared some great practical insights based on his own experience helping teams go from experiments to real-world production.

Sharing here what he shared with me, and what I experienced myself -

Data matters way more than I thought. Initially, I focused a lot on model architectures and less on the quality of my data pipelines. Production performance heavily depends on robust data handling—things like proper data versioning, monitoring, and governance can save you a lot of headaches. This becomes way more important when your toy-project becomes a collaborative project with others.

LLMs need their own rules. Working with large language models introduced challenges I wasn't fully prepared for—like hallucinations, biases, and the resource demands. Dean suggested frameworks like RAES (Robustness, Alignment, Efficiency, Safety) to help tackle these issues, and it’s something I’m actively trying out now. He also mentioned "LLM as a judge" which seems to be a concept that is getting a lot of attention recently.

Some practical tips Dean shared with me:

Save chain of thought output (the output text in reasoning models) - you never know when you might need it. This sometimes require using the verbos parameter.

Log experiments thoroughly (parameters, hyper-parameters, models used, data-versioning...).

Start with a Jupyter notebook, but move to production-grade tooling (all tools mentioned in the guide bellow 👇🏻)

To help myself (and hopefully others) visualize and internalize these lessons, I created an interactive guide that breaks down how successful ML/LLM projects are structured. If you're curious, you can explore it here:

https://www.readyforagents.com/resources/llm-projects-structure

I'd genuinely appreciate hearing about your experiences too—what’s your favorite MLOps tools? I think that up until today dataset versioning and especially versioning LLM experiments (data, model, prompt, parameters..) is still not really fully solved.

5

8

A community statement supporting the Open Source Definition (OSD) (osd.fyi)

submitted 7 months ago* (last edited 7 months ago) by Shamar@feddit.it to c/machinelearning@lemmy.ml

1 comments fedilink

Declaration

We, the undersigned members of the Open Source community, assert that Open Source is defined solely by the Open Source Definition (OSD) version 1.9.

Any amendments or new definitions shall only be recognized if declared by clear community consensus through a transparent process to be determined.

6

2

Transformer Explainer (poloclub.github.io)

submitted 10 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

7

8

Alibaba claims no. 1 spot in AI math models with Qwen2-Math (venturebeat.com)

submitted 10 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

8

5

How to convert a positionally encoded predicted embedding from a decoder to its matching token? (infosec.pub)

submitted 10 months ago by yboutros@infosec.pub to c/machinelearning@lemmy.ml

2 comments fedilink

When training a transformer on positionally encoded embeddings, should the tgt output embeddings also be positionally encoded? If so, wouldn't the predicted/decoded embeddings also be positionally encoded?

9

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow (decrypt.co)

submitted 10 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

1 comments fedilink

10

24

AI models collapse when trained on recursively generated data (www.nature.com)

submitted 11 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

3 comments fedilink

11

1

RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing (lmsys.org)

submitted 11 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

12

6

Alibaba's Qwen LLM model leading open source rankings (huggingface.co)

submitted 11 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

13

7

By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K. That’s better than GPT-4, Claude and Gemini, with 200x fewer parameters! (arxiv.org)

submitted 1 year ago* (last edited 1 year ago) by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

2 comments fedilink

14

6

Mixture of Agents (MoA) leverages several open-source LLM agents to achieve a score of 65.1% on AlpacaEval 2.0 (www.together.ai)

submitted 1 year ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

15

4

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate (huggingface.co)

submitted 1 year ago by ylai@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

16

8

Torrent tracker for open models (aitracker.art)

submitted 1 year ago by keepthepace@slrpnk.net to c/machinelearning@lemmy.ml

0 comments fedilink

Someone (Dreamertist on reddit) got tired of depending on Huggingface for downloading models and proposes a torrent tracker to share more efficiently these huge blobs.

It just started, only a few models uploaded yet, but I think it is worth that we all put our local stash online there. Making a new torrent is super easy (one missing step though: when "re-downloading" the model you need to save it in the directory where it already exists. This way it will "resume" at 100% completion and switch to seeding mode)

17

6

Can gpt generate a gpt model? (sh.itjust.works)

submitted 1 year ago by wargreymon@sh.itjust.works to c/machinelearning@lemmy.ml

5 comments fedilink

Imagine AI giving offsprings...

18

2

Sakuga-42M Dataset: Scaling Up Cartoon Research (arxiv.org)

submitted 1 year ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

19

5

How AI 'Understands' Images (CLIP) (www.youtube.com)

submitted 1 year ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

20

9

Where do these stains come from and how can I fix them? (sopuli.xyz)

submitted 1 year ago* (last edited 1 year ago) by smokinliver@sopuli.xyz to c/machinelearning@lemmy.ml

2 comments fedilink

Hey guys,

I have been experimenting with self-supervised visual learning a bit. Until now I have only ever used U-Nets and related architectures.

No matter what specific task, images or other parameters I changed I always encountered these stains on my output-images (here marked with green), although sometimes more, sometimes less.

Now I wondered if anybody could tell me where they came from and how I could prevent them?

In the attached picture the input (left) and target (right) are the same, so that I can be sure these stains do not come from a badly designed learning task, yet they still appear (output is the middle image).

Thanks in advance and all the best :D

Edit: added line breaks

21

0

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models (hidiffusion.github.io)

submitted 1 year ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

22

3

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior (animate-your-word.github.io)

submitted 1 year ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

23

5

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance (arxiv.org)

submitted 1 year ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

24

0

What are your thoughts on Microsoft Copilot? (lemmy.blahaj.zone)

submitted 1 year ago by Kit@lemmy.blahaj.zone to c/machinelearning@lemmy.ml

8 comments fedilink

Copilot sounds amazing on paper. The free (to 365 subs) version on the web is just Chat GPT4, so that's familiar enough. The integration with 365 applications is really what grabs me. Stuff like tossing it 10 spreadsheets and asking it to analyze and compare the data, having a virtual assistant to remind me of upcoming actionables, and summarizing a meeting when I zone out - it all sounds really handy.

I met with Microsoft last week and they're down for giving me a 90 day trial if I want to take it for a spin. Any thoughts or suggestions? I ideally want to determine if this will improve productivity for my end users enough to be worth the insane cost of $30/user/mo.

25

15

Looking for a specific OpenAI employee personal blog (lemmy.zip)

submitted 1 year ago by TheHobbyist@lemmy.zip to c/machinelearning@lemmy.ml

3 comments fedilink

Hi all,

I think around 1 or 2 years ago, I stumbled upon a personal blog of an asian woman (I think) working at OpenAI. She had numerous extensive fascinating blog posts on a black themed blog, going into the technical details of embeddings of language models and such.

I can no longer find that blog and have no other information to go by. Would anyone possibly know which blog I'm referring to? It would be very much appreciated.

Machine Learning

Declaration