First off, IIRC, COUNT(*)
used to be slightly faster (~10-15%) than COUNT(some_column)
in PG. There's a chance that recent versions of PG have fixed this inconsistency but still worth benchmarking.
Now to the query:
To my mind, we've already got comment_aggregate
which is supposed to store the result of the query shared above, right? Why do we need to run that SELECT
again instead of simply:
-- pseudo-code
SELECT
ca.id, ca.child_count, ca.path
FROM
comment_aggregate ca
WHERE
ca.post_id = :post_id
I think I'm confusing matters here b/c I don't know lemmy's DB structure. Is there a link to an ERD/SQL/... you could share so I could take a look and leave more educated replies?
I have a feeling that it's going to make a noticeable difference; it's way cheaper than a
JOIN ... GROUP BY
query.Given the declarative nature of SQL, I'm afraid that's not possible - at least to my knowledge.
But worry not! That's why there are stored procedures in almost every RDBMS; to add an imperative flare to the engine.
In purely technical terms, Implementing what you're thinking about is rather straight-forward in a stored procedure using a
CURSOR
. This could be possibly the quickest win (plus the idea ofCOUNT(*)
if applicable.)Now, I'd like to suggest a possibly longer route which I think may be more scalable. The idea is based around the fact that comments themselves are utterly more important than the number of child comments.
INSERT/UPDATE/SELECT
are super quick oncomment
andpost
.child_count
is eventually correctly updated when (1) happens.Before rambling on, I'd like to ask if you think the priorities make sense? If they do, I can elaborate on the implementation.