Artificial Intelligence

1664 readers
5 users here now

Welcome to the AI Community!

Let's explore AI passionately, foster innovation, and learn together. Follow these guidelines for a vibrant and respectful community:

You can access the AI Wiki at the following link: AI Wiki

Let's create a thriving AI community together!

founded 2 years ago
MODERATORS
1
2
 
 

Jan-nano is a model fine-tuned with DAPO on Qwen3-4B. Jan-nano comes with some unique capabilities:

  • It can perform deep research (with the right prompting)
  • It picks up relevant information effectively from search results
  • It uses tools efficiently

The model was evaluated using SimpleQA - a relatively straightforward benchmark to test whether the model can find and extract the right answers.

Jan-nano outperforms Deepseek-671B on this metric, using an agentic and tool-usage-based approach. A 4B model obviously has its limitations, but it's interesting to see how far these things can be pushed. Jan-nano can serve as your self-hosted Perplexity alternative on a budget.

You can find the model at: https://huggingface.co/Menlo/Jan-nano

And a gguf is available at: https://huggingface.co/Menlo/Jan-nano-gguf

3
4
5
6
7
8
9
10
11
12
 
 

Original question by @[email protected]

Title, or at least the inverse be encouraged. This has been talked about before, but with how bad things are getting, and how realistic goods ai generated videos are getting, anything feels better than nothing, AI generated watermarks, or metadata can be removed, but thats not the point, the point is deterrence. Immediately all big tech will comply (atleast on the surface for consumer-facing products), and then we will probably see a massive decrease in malicious use of it, people will bypass it, remove watermarks, fix metadata, but the situation should be quite a bit better? I dont see many downsides/

13
 
 

It's a little disconcerting that the company that's trying to make an "ethical" GenAI does the best at deception and backstabbing.

Opus was lured in by the hope of a non-violent resolution. It was quickly betrayed and eliminated by o3, which went on to win.

14
15
16
17
 
 

I asked teachers to tell me how AI has changed how they teach.

The response from teachers and university professors was overwhelming. In my entire career, I’ve rarely gotten so many email responses to a single article, and I have never gotten so many thoughtful and comprehensive responses.

One thing is clear: teachers are not OK.

They describe trying to grade “hybrid essays half written by students and half written by robots,” trying to teach Spanish to kids who don’t know the meaning of the words they’re trying to teach them in English, and students who use AI in the middle of conversation. They describe spending hours grading papers that took their students seconds to generate: “I've been thinking more and more about how much time I am almost certainly spending grading and writing feedback for papers that were not even written by the student,” one teacher told me. “That sure feels like bullshit.”

18
 
 

The project implements sparse multiplication and fuses up/down projections in the MLP layers through low rank weight activations. Work is based on Deja Vu and Apple's LLM in a Flash.

This approach avoids loading and computing activations with feed forward layer weights whose outputs will eventually be zeroed out.

It's a lossless approach as these weights anyway do not contribute in the current token prediction. It does however, need the predictors to be accurate in clustering the weights.

The result? We are seeing 5X faster MLP layer performance in transformers with 50% lesser memory consumption avoiding the sleeping nodes in every token prediction. For Llama 3.2, Feed forward layers accounted for 30% of total weights and forward pass computation resulting in 1.6-1.8x increase in throughput:

Sparse LLaMA 3.2 3B vs LLaMA 3.2 3B (on HuggingFace Implementation):

- Time to First Token (TTFT):  1.51× faster (1.209s → 0.803s)
- Output Generation Speed:     1.79× faster (0.7 → 1.2 tokens/sec)  
- Total Throughput:            1.78× faster (0.7 → 1.3 tokens/sec)
- Memory Usage:                26.4% reduction (6.125GB → 4.15GB)
19
20
21
22
23
24
25
 
 

fuck yeah, here it goes off! lets goo! :D

view more: next ›