overview for bahmanm

Lemmy server mass update of comment reply (child) count with PostgreSQL ltree structure in c/[email protected]

[–] [email protected] 1 points 2 years ago (3 children)

potential to reuse

I have a feeling that it's going to make a noticeable difference; it's way cheaper than a JOIN ... GROUP BY query.

order they are updated in so that the deepest child gets count updated first

Given the declarative nature of SQL, I'm afraid that's not possible - at least to my knowledge.

But worry not! That's why there are stored procedures in almost every RDBMS; to add an imperative flare to the engine.

In purely technical terms, Implementing what you're thinking about is rather straight-forward in a stored procedure using a CURSOR. This could be possibly the quickest win (plus the idea of COUNT(*) if applicable.)

Now, I'd like to suggest a possibly longer route which I think may be more scalable. The idea is based around the fact that comments themselves are utterly more important than the number of child comments.

The first priority should be to ensure INSERT/UPDATE/SELECT are super quick on comment and post.
The second priority should be to ensure child_count is eventually correctly updated when (1) happens.
The last priority should be to try to make (2) as fast as we can while making sure (3) doesn't interfere w/ (1) and (2) performance.

Before rambling on, I'd like to ask if you think the priorities make sense? If they do, I can elaborate on the implementation.

Lemmy server mass update of comment reply (child) count with PostgreSQL ltree structure in c/[email protected]

[–] [email protected] 2 points 2 years ago (5 children)

First off, IIRC, COUNT(*) used to be slightly faster (~10-15%) than COUNT(some_column) in PG. There's a chance that recent versions of PG have fixed this inconsistency but still worth benchmarking.

Now to the query:

To my mind, we've already got comment_aggregate which is supposed to store the result of the query shared above, right? Why do we need to run that SELECT again instead of simply:

-- pseudo-code
SELECT 
  ca.id, ca.child_count, ca.path
FROM 
  comment_aggregate ca
WHERE 
  ca.post_id = :post_id

I think I'm confusing matters here b/c I don't know lemmy's DB structure. Is there a link to an ERD/SQL/... you could share so I could take a look and leave more educated replies?

Top Cybersecurity Services In Dubai | Cybersecurity Dubai in c/[email protected]

[–] [email protected] 1 points 2 years ago

😕

Top Cybersecurity Services In Dubai | Cybersecurity Dubai in c/[email protected]

[–] [email protected] 6 points 2 years ago (4 children)

I don't think you're supposed to post business ads in this community (and most others); hence the downvotes.

Perhaps a line in the community description could help w/ this in the future?

Lemmy server mass update of comment reply (child) count with PostgreSQL ltree structure in c/[email protected]

[–] [email protected] 1 points 2 years ago (7 children)

DISCLAIMER: I've never looked at lemmy's code base. 🤦‍♂️

I think no matter any possible optimisation to the query (if any), the current design may not be going to scale very well given it traverses all the comment X comment space every time a comment is added.

To my mind, it works well when there are many shallow comments (ie little nesting/threading) which might not be the best strategy for the content lemmy serves.

Can you share the structures of comment_aggregates and comment? I feel there's a good opportunity for denormalisation there which may mean better performance.

That said, here's one concrete idea that crossed my mind and could be worth benchmarking:

Create an AFTER UPDATE trigger on comment_aggregates which updates a comment's immediate parent(s) child_count (basically increment it by 1.)
Re-write the posted query to only update the immediate parents of the comment being added.

That results in the trigger being run exactly m times where m is the number of comments of the subtree where the new comment was just added to.

Does that make sense?

How Python dictionaries work in c/[email protected]

[–] [email protected] 1 points 2 years ago (1 children)

Oh, I see. It kind of makes sense. Who'd need Jython when GraalVM's out there nowadays!? Well, unless it's a legacy app 😬

Bash not sourcing .profile automatically in Debian 11 in c/[email protected]

[–] [email protected] 2 points 2 years ago (1 children)

You've probably already checked it out but make sure you've got only one of .profile, .bash_profile, .bashprofile. bash will only execute one of them in case there's more than one (not sure which one off the top of my head.)

Anyone have a good working hibernate set-up in Linux? in c/[email protected]

[–] [email protected] 7 points 2 years ago* (last edited 2 years ago)

Suspend-then-hibernate

That's something over which I used to be very jealous of Windows laptops 😕 But that was years ago...now my aging 3.2kg ThinkPad is just a "stationery" workstation!

What are good books about the philosophy of programming? in c/[email protected]

[–] [email protected] 2 points 2 years ago

Not a direct answer to your question but here's what I've learned and am learning:

It all boils down to "finding the right balance between the costs of implementation, the value the implementation offers given the circumstances and constraint." Essentially, the foundational guideline of engineering across all engineering principles.

Usually every decision brings about about a series of advantages/improvement but it's important to remember that "one must lose in order to gain."[1] That is, every improvement (value) comes at a price (cost). Unlike other principles of engineering (which are closer to bare maths), software engineering more closely resembles something intuition-based like art. That is what makes it difficult to see the values and costs and measure them. It takes lots of practice and introspective and extrospective (!) effort; doing things and potentially making mistakes and learning from them is as important as observing others do things and make mistakes.

In other words, it boils down to honing your intuition to "do the right thing, at the right time, the right way."

PS: Please note that I used the word "right" and not "correct."

[1] Dialectically speaking, every material good contains w/i itself its own seeds of destruction 😆

How Python dictionaries work in c/[email protected]

[–] [email protected] 2 points 2 years ago

I genuinely wonder why the downvotes?

What is a good keyboard app for Android? in c/[email protected]

[–] [email protected] 3 points 2 years ago (2 children)

I'm already liking it - I've switced to it as my main keyboard. I'm actually typing this on my phone (miserably slowly 😂 - it definitely needs practice getting used to.) I think it's a good practice for the memory muscles & the brain! Thanks.

How Python dictionaries work in c/[email protected]

[–] [email protected] 4 points 2 years ago (6 children)

Good refresher on the topic and ineteresting gory details of CPython impl.

On another note, is Jython still a thing?