Optimize read from m-to-n relationship

Optimize read from m-to-n relationship - sql

I have a classic m-to-n relationship between app_user, user_post and user_comment. You can take a look at the SQLFiddle example.
------------ ---------------- -------------
| app_user |----<| user_comment |>----| user_post |
------------ ---------------- -------------
Task
Given a created_at timestamp for a user post, I want to fetch all user_comment records, if the user has created a comment on a post that was created before a certain date.
Solution
The following is how such a query could look like, but I was wondering if this can be optimized for speed. As you can see, there's four SELECT statements involved as sub-queries.
SELECT *
FROM user_post
JOIN (
SELECT *
FROM user_comment
WHERE user_comment.app_user_id IN (
SELECT user_comment.app_user_id
FROM user_comment
WHERE user_comment.user_post_id IN (
SELECT user_post.id
FROM user_post
WHERE user_post.created_at < '2022-01-02 00:00.00'
)
)
) AS fpr ON fpr.user_post_id = user_post.id;
Extra points if somebody can show me how to implement an optimized version using sqlalchemy.
I can tell you this: ChatGPT does not seem to be able to understand this assignment.

Related

How to get a id value of tables in postgres

How to get a unique, identical value of a table?
For example, if there are tables like 't_aa', 't_bb', 't_cc', I want a result like below.
id | table_name
-------------------
1 | 't_aa'
2 | 't_bb'
3 | 't_cc'
What I exactly want is to get a specific, and unique number from the name of tables.
I have tried
SELECT * FROM information_schema.tables;
-- or
SELECT * FROM pg_catalog.pg_tables;
but this doesn't provide any identical numbers to me.
I hope there is some way to get results like above by using some lines of query,
but if I really have to make a new table for this, that could be okay as an alternative.
please help me, thank you
-- edit
I need numbers because I will use it as an advisory lock key for some reasons.

ThIs is it:
SELECT table_name,ROW_NUMBER () OVER (
ORDER BY table_name
) as id FROM information_schema.tables;

SQL SELECT WHERE IN another SELECT with GROUP_CONCAT

Good Day,
I have 3 Tables - Ticket, Ticket Batch (Multiple Ticket Rows To One Batch) and Ticket Staff (Multiple Staff Rows To One Ticket) and wish to ultimately UPDATE the ticket_batch table with the COUNT of all staff working on tickets per ticket batch.
The tables with applicable columns look as follows
ticket:
| ticket_number | recon_number |
ticket_batch:
| recon_number |
ticket_staff:
| ticket_number |
So I have written the following SQL query to essentially first if I do get the COUNT:
SELECT COUNT(*)
FROM ticket_staf
WHERE ticket_staff.ticket_number IN (SELECT GROUP_CONCAT(ticket.ticket_number) FROM ticket WHERE ticket.recon_number = 1);
Which the query just keeps running, but when I execute the queries separately:
SELECT GROUP_CONCAT(ticket.ticket_number)
FROM ticket
WHERE ticket.recon_number = 1;
I get 5 ticket numbers within split seconds and if I paste that string in the other portion of the query:
SELECT COUNT(*)
FROM ticket_staff
WHERE ticket_staff.ticket_number IN (1451,1453,1968,4457,4458);
It returns the correct COUNT.
So ultimately I guess can I not write queries with GROUP_CONCATS into another SELECT WHERE IN? And how should I structure my query?
Thanks for reading :)

I prefer Inner join as follows:
SELECT COUNT(distinct ts.*)
FROM ticket_staff ts
LEFT JOIN ticket t
ON ts.ticket_number = t.ticket_number
WHERE t.recon_number = 1;

GROUP_CONCAT() doesn't look right. I suspect you are confusing a list of values for IN with a string. They are not the same thing.
In general, I would recommend EXISTS over IN anyway:
SELECT COUNT(*)
FROM ticket_staff ts
WHERE EXISTS (SELECT 1
FROM ticket t
WHERE ts.ticket_number = t.ticket_number AND
t.recon_number = 1
);
For this query, you want an index on ticket(ticket_number, recon_number). However, I am guessing that ticket(ticket_number) is the primary key, which is enough of an index by itself.

How to join 2 tables without common fields?

There are 2 tables:
Table 1: first_names
id | first_name
1 | Joey
7 | Ross
17| Chandler
Table 2: last_names
id | first_name
2 | Tribbiani
7 | Geller
25| Bing
Desired result:
id | full_name
1 | Joey Tribbiani
2 | Ross Geller
3 | Chandler Bing
Task:
Write the solution using only the simplest SQL syntax. Using store procedures, declaring variables, ROW_NUMBER(), RANK() functions are forbidden.
I have solution using ROW_NUMBER() function, but no ideas about solving this task using only the simplest SQL syntax.
P.S. I'm only trainee and it's my first question on stackoverflow

Simple join will suffice here
select * from first_names fn
join last_names ln on fn.id = ln.id - 1
But your question is very unclear though. Because join here is based rather on knowledge about Friends series rather than concrete logic...

You must create an id to join the tables.
This can be the order number in the table based in ids:
select
f.counter id, concat(f.first_name, ' ', l.last_name) full_name
from (
select t.*, (select count(*) from first_names where id < t.id) + 1 counter
from first_names t
) f inner join (
select t.*, (select count(*) from last_names where id < t.id) + 1 counter
from last_names t
) l
on l.counter = f.counter
See the demo.
Results:
> id | full_name
> -: | :-------------
> 1 | Joey Tribbiani
> 2 | Ross Geller
> 3 | Chandler Bing

Honestly, this is a stupid solution; it's vastly inefficient to ROW_NUMBER, and I wouldn't be surprised if LEAD is "not allowed" as ROW_NUMBER isn't. The fact that you were told to "use the simpliest SQL" means that the SQL you want to use is a subquery/CTE and ROW_NUMBER; that is as simple as this can really go. Anything else add a layer on unneeded complexity and will likely just make the query suffer from performance degradation. This one, for example, means you need to scan both tables twice; where as with ROW_NUMBER it would be once.
CREATE TABLE FirstNames (id int, FirstName varchar(10));
CREATE TABLE LastNames (id int, LastName varchar(10));
INSERT INTO FirstNames
VALUES(1,'Joey'),
(7,'Ross'),
(17,'Chandler');
INSERT INTO LastNames
VALUES (2,'Tribbiani'),
(7,'Geller'),
(25,'Bing');
GO
WITH CTE AS(
SELECT FN.id,
FN.FirstName,
LN.LastName
FROM FirstNames FN
LEFT JOIN LastNames LN ON FN.id = LN.id
UNION ALL
SELECT LN.id,
FN.FirstName,
LN.LastName
FROM LastNames LN
LEFT JOIN FirstNames FN ON LN.id = FN.id
WHERE FN.id IS NULL),
FullNames AS(
SELECT C.id,
C.FirstName,
ISNULL(C.LastName, LEAD(C.LastName) OVER (ORDER BY id)) AS LastName
FROM CTE C)
SELECT *
FROM FullNames FN
WHERE FN.FirstName IS NOT NULL
ORDER BY FN.id;
GO
DROP TABLE FirstNames;
DROP TABLE LastNames;
To answer the "Task" given:
"Task: Write the solution using only the simplest SQL syntax. Using store procedures, declaring variables, ROW_NUMBER(), RANK() functions are forbidden."
My answer would be the below?
"Why is this a requirement? SQL Server has supported ROW_NUMBER for 14 years, since SQL Server 2005. If you can't use ROW_NUMBER this infers you're using SQL Server 2000. This is actually a big security problem for the company, as 2000 has been out of support for close to a decade. Legislation like GDPR require a company to keep the technology they use secure, and it is very unlikely that this is therefore being met.
If this is the case, the solution if not the find a way around using ROW_NUMBER but to get the company back up to do date. The latest version of SQL Server that you can upgrade to from SQL Server 2000 is 2008; which also runs out of support on July 16 of this year. We'll need to get an instance up and running and get the existing features into this new server ASAP and get QA testing done as soon as possible. This needs to be the highest priority thing. After that we need to repeat the cycle to another version of SQL Server. The latest is 2017, which does support migration from 2008.
Once we've done that, we can then actually make use of ROW_NUMBER in the query; providing the simplest solution and also bringing the company back into a secure environment."
Sometimes requirements need to be challenged. From experience management can make some "stupid" requirements, because they don't understand the technology. When you're in an IT role, sometimes you will need to question those requirements and explain why the requirement isn't actually a good idea. Then, instead, you can aid Management to find the correct solution for the problem. At the end of the day, what they might be trying to fix could be an XY problem; and part of your troubleshooting will be to find out what X really is.

How to add aggregate value to SELECT?

I'm selecting data from multiple tables and I also need to get maximum "timestamp" on those tables. I will need that to create custom cache control.
tbl_name tbl_surname
id | name id | surname
--------- ------------
0 | John 0 | Doe
1 | Jane 1 | Tully
... ...
I have following query:
SELECT name, surname FROM tbl_name, tbl_surname WHERE tbl_name.id = tbl_surname.id
and I need to add following info to result set:
SELECT MAX(ora_rowscn) FROM (SELECT ora_rowscn FROM tbl_name
UNION ALL
SELECT ora_rowscn FROM tbl_surname);
I was trying to use UNION but I get error - mixing group and not single group data - or something like that, I know why I cannot use the union.
I don't want to split this into 2 calls, because I need the timestamp of the current snapshot I took from DB for my cache management. And between select and the call for MAX the DB could change.
Here is result I want:
John | Doe | 123456
Jane | Tully | 123456
where 123456 is approximate time of last change (insert, update, delete) of tables tbl_name and tbl_surname.
I have read only access to DB, so I cannot create triggers, stored procedures, extra tables etc...
Thanks for any suggestions.
EDIT: The value *ora_rowscn* is assigned per block of rows. So in one table this value can differ per row. I need the maximal value from both (all) tables involved in query.

Try:
SELECT name,
surname,
max(greatest(tbl_name.ora_rowscn, tbl_surname.ora_rowscn)) over () as max_rowscn
FROM tbl_name, tbl_surname
WHERE tbl_name.id = tbl_surname.id

There's no need to aggregate here - just include both ora_rowscn values in your query and take the max:
SELECT
n.name,
n.ora_rowscn as n_ora_rowscn,
s.surname,
s.ora_rowscn as s_ora_rowscn,
greatest(n.ora_rowscn, s.ora_rowscn) as last_ora_rowscn
FROM tbl_name n
join tbl_surname s on n.id = s.id
BTW, I've replaced your old-style joins with ANSI style - better readable, IMHO.

Best practice for setup and querying versioned records in T-SQL

I'm trying to optimize my SQL queries and I always come back to this one issue and I was hoping to get some insight into how I could best optimize this.
For brevity, lets say I have a simple employee table:
tbl_employees
Id HiredDateTime
------------------
1 ...
2 ...
That has versioned information in another another table for each employee:
tbl_emplyees_versioned
Id Version Name HourlyWage
-------------------------------
1 1 Bob 10
1 2 Bob 20
1 3 Bob 30
2 1 Dan 10
2 2 Dan 20
And this is how the latest version records are retrieved in a View:
Select tbl_employees.Id, employees_LatestVersion.Name, employees_LatestVersion.HourlyWage, employees_LatestVersion.Version
From tbl_employees
Inner Join tbl_employees_versioned
ON tbl_employees.Id = tbl_employees_versioned.Id
CROSS APPLY
(SELECT Id, Max(Version) AS Version
FROM tbl_employees_versioned AS employees_LatestVersion
WHERE Id = tbl_employees_versioned.Id
GROUP BY Id) AS employees_LatestVersion
To get a response like this:
Id Version Name HourlyWage
-------------------------------
1 3 Bob 30
2 2 Dan 20
When pulling a query that has over 500 employees records for which each have a couple few versions, this query starts choking up and takes a few seconds to run.
There are a couple strikes right off the bat, but I'm not sure how to overcome them.
Obviously the Cross Apply adds some performance loss. Is there a best practice when dealing with versioned information like this? Is there a better way to get just a record with the highest version?
The versioned table doesn't have a clustered index beause neither Id or Version are unique. Concatenated together they would be, but it doesn't work like that. Instead there is a non-clustered index for Id and another one for Version. Is there a better way to index this table to get any performance gain? Would an indexed view really help here?

I think the best way to structure the data is using start dates and end dates. So, the data structure for your original table would look like:
create table tbl_EmployeesHistory (
EmployeeHistoryId int,
EffDate date not null,
EndDate date,
-- Fields that describe the employee during this time
)
Then, you can see the current version using a view:
create view vw_Employees as
select *
from tbl_EmployeesHistory
where EndDate is NULL
In some cases, where future end dates are allowed, the where clause would be:
where coalesce(EndDate, getdate()) >= getdate()
Alternatively, in this case, you can default EndDate to some future date far, far away such as '01-o1-9999'. You would add this as the default in the create table statement, make the column not null, and then you can always use the statement:
where getdate() between EffDate and EndDate
As Martin points out in his comment, the coalesce() might impede the use of an index (it does in SQL Server), whereas this does not have that problem.
This is called a slowly changing dimension. Ralph Kimball discusses this concept in some length in his books on data warehousing.

Here's one way you can get a view of the most recent version for each employee:
Select Id, Name, HourlyWage, Version
FROM (
Select E.Id, V.Name, V.HourlyWage, V.Version,
row_number() OVER (PARTITION BY V.ID ORDER BY V.Version DESC) as nRow
From tbl_employees E
Inner Join tbl_employees_versioned V ON E.Id = V.Id
) A
WHERE A.nRow = 1
I suspect that this will perform better than your previous solution. One index across Id and Version in tbl_employees_versioned would most likely also help.
Also, note that you only need to join on tbl_employees if you're selecting fields that are not in tbl_employees_versioned.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas