Convert aliases and distinct subquery from SQLite to SQLAlchemy - sql

I am trying to convert a SQLite statement into python SQLAlchemy to be used with FASTApi. I am not sure how to convert a query this complex with aliases of s and p for the single prices table.
Here is the SQLite query:
SELECT s.security_id, p.price, MAX(p.price_datetime) price_datetime
FROM (SELECT DISTINCT security_id FROM prices) s
LEFT JOIN prices p ON p.security_id = s.security_id AND p.price_datetime <= '2022-08-10 19:000:00.000000'
GROUP BY s.security_id;
Here is my attempt so far:
# starting attempt so far
select(models.Price.security_id, models.Price.price, func.max(models.Price.price_datetime), models.Price.price_datetime)

First wonder is why do you have such a complicated query ? Selecting distinct security_id to join again, to group by security_id makes no sense to me.
I have come up with this much simpler version, which in my tests works the same.
SELECT security_id, price, MAX(price_datetime) price_datetime
FROM prices
WHERE price_datetime <= '2022-02-01'
GROUP BY security_id;
Which then is fairly easy to translate to SQLAlchemy.
stmt = (
select(
Price.security_id,
Price.price,
func.max(Price.price_datetime).alias("price_datetime"),
)
.filter(Price.price_datetime <= '2022-02-01')
.group_by(Price.security_id)
)
After OP's comment:
SELECT s.id, p.price, MAX(p.price_datetime) AS price_datetime
FROM security AS s
LEFT JOIN prices as p
ON s.id = p.security_id AND p.price_datetime <= '2021-02-01'
GROUP BY s.id;
which should translate to
stmt = (
select(
Security.id,
Price.price,
func.max(Price.price_datetime).label("price_datetime"),
)
.join(
Price,
and_(
Security.id == Price.security_id,
Price.price_datetime <= "2022-01-01",
),
isouter=True,
)
.group_by(Security.id)
)

Related

Converting Nested SQL to ORM in Django

I have a Query Like this
SELECT
*,
(
SELECT
COALESCE(json_agg(product_attribute), '[]')
FROM
(
SELECT
*
FROM
optimus_productattribute as product_attribute
WHERE
product.id = product_attribute.product_id
)
AS product_attribute
)
AS product_atttribute
FROM
optimus_product as product inner join optimus_store as store on product.store_id = store.id
and I want to convert it to ORM
Have Tried JSONBAgg but it says it can only be used to have Single Column
Product.objects.filter(store_id=787).annotate(attributes=Coalesce(JSONBAgg(ProductAttribute.objects.filter(product=OuterRef('pk')).values_list('id', 'uuid')),[]))

ORDER in CTE lost after GROUP BY

I have the following SQL
WITH tally AS (
SELECT results.answer,
results.poll_id,
count(1) AS votes
FROM (
SELECT pr.poll_id,
unnest(pr.response) AS answer
FROM poll_responses pr
LEFT JOIN polls p ON pr.poll_id = p.id
LEFT JOIN poll_collections pc ON pc.id = p.poll_collection_id
WHERE pc.id = ${pollCollectionId}
) AS results
GROUP BY results.answer, results.poll_id
),
all_choices AS (SELECT unnest(pls.choices) AS choice,
pls.id AS poll_id
FROM poll_collections pcol
INNER JOIN polls pls
ON pcol.id = pls.poll_collection_id
WHERE pcol.id = ${pollCollectionId}),
unvoted_tally AS (SELECT ac.choice AS answer,
ac.poll_id,
0 AS total
FROM all_choices ac
LEFT JOIN tally t ON t.answer = ac.choice
WHERE t.answer IS NULL),
final_tally AS (SELECT *
FROM tally
UNION
ALL
SELECT *
FROM unvoted_tally),
sorted_tally AS (
SELECT ft.*
FROM final_tally ft
ORDER BY array_position(array(SELECT choice FROM all_choices), ft.answer)
)
SELECT json_agg(poll_results.polls) AS polls
FROM (
SELECT json_array_elements(json_agg(results)) -> 'poll' AS polls
FROM (
SELECT json_build_object(
'id', st.poll_id,
'question', pls.question,
'choice-type', pls.choice_type,
'results',
json_agg(json_build_object('choice', st.answer, 'votes', st.votes)),
'chosen', pr.response
) AS poll
FROM sorted_tally st
LEFT JOIN polls pls
ON
pls.id = st.poll_id
LEFT JOIN poll_responses pr
ON
pr.poll_id = st.poll_id AND
pr.email = ${email}
GROUP BY st.poll_id, pls.choice_type, pr.response, pls.question
) AS results)
AS poll_results;
I have a poll_responses table which store the user responses of a poll. I want to order the responses in exactly the same order they are stored in the polls table - as an array e.g., {Yes, No, Maybe}.
I applied the ORDER BY array_position(array(SELECT choice FROM all_choices), ft.answer) in the sorted_tally CTE.
However, in the file SELECT after applying GROUP BY the order is lost.
Is there a way to preserve the order of the choices?
Also, are there any optimizations applicable?
Much appreciated!
In json_build_object or json_agg you can set ORDER BY clause. First, have the last CTE SELECT needed order expression as a new column, then run in outermost query:
CTE
...
sorted_tally AS (
SELECT ft.votes
, ft.poll_id
, ft.answer
, array_position(array(SELECT choice FROM all_choices),
ft.answer) AS choice_order
FROM final_tally ft
ORDER BY
)
Outermost Query
...
json_build_object(
'id', st.poll_id,
'question', pls.question,
'choice-type', pls.choice_type,
'results', json_agg(json_build_object('choice', st.answer,
'votes', st.votes)
ORDER BY st.choice_order),
'chosen', pr.response
) AS poll
ORDER BY in a CTE doesn't really matter. It may work, but SQL Server is free to re-order the rows unless you specify ORDER BY in the outermost query to order all the results.

When subquery behind SELECT can not be removed?

Correlated subqueries are considered to be a bad habit. I believe that any SQL command with a subquery between SELECT and FROM (lets call it SELECT subquery) can be rewritten into a SQL without any. For example query like this
select *,
(
select sum(t2.sales)
from your_table t2
where t2.dates
between t1.dates - interval '3' day and
t1.dates and
t2.id = t1.id
) running_sales
from your_table t1
demo
can be rewritten into the following one
select dd.id, dd.dates, dd.sales, sum(d.sales) running_sales
from your_table dd
join your_table d on d.dates
between (dd.dates - interval '3' day) and
dd.dates and
dd.id = d.id
group by dd.id, dd.dates, dd.sales
demo
The problems may occur when there is more than one SELECT subquery, however, even in those case, it is possible to rewrite them into a subquery behind FROM and then perform a LEFT JOIN in the following spirit
select *,
(
select sum(sales)
from dat dd
where dd.dates
between (d.dates - interval '3' day) and d.dates and
dd.id = d.id
) running_sales,
(
select sum(sales)
from dat dd
where dd.id = d.id
) total_sales
from dat d
demo
can be rewritten into the following one
select d.*,
t_running.running_sales,
t_total.total_sales
from dat d
left join (
select dd.id, dd.dates, sum(d.sales) running_sales
from dat dd
join dat d on d.dates
between (dd.dates - interval '3' day) and
dd.dates and
dd.id = d.id
group by dd.id, dd.dates
) t_running on d.id = t_running.id and d.dates = t_running.dates
left join (
select d.id, sum(d.sales) total_sales
from dat d
group by d.id
) t_total on t_total.id = d.id
demo
Could you please provide me an example where it is not possible to get rid of the SELECT subquery? Please be so kind and add also a working example link (e.g. dbfiddle, or sqlfiddle) to make the potential disscussion is easier, thanks!
If the question is for a multiple-choice test (or something like that) :) , it is not possible to get rid of subquery for EXISTS clause.
An other similar answeris for IN (subquery) for different level of aggregation to avoid cartesian product.
(same comment by the way : correlated subqueries are not considered everytime to be a bad habit, it depends of optimization, structure, etc....
The WITH is a sort of use of correlated subqueries... and it's very practical for complex queries. )

PostgreSQL - how to query "result IN ALL OF"?

I am new to PostgreSQL and I have a problem with the following query:
WITH relevant_einsatz AS (
SELECT einsatz.fahrzeug,einsatz.mannschaft
FROM einsatz
INNER JOIN bergefahrzeug ON einsatz.fahrzeug = bergefahrzeug.id
),
relevant_mannschaften AS (
SELECT DISTINCT relevant_einsatz.mannschaft
FROM relevant_einsatz
WHERE relevant_einsatz.fahrzeug IN (SELECT id FROM bergefahrzeug)
)
SELECT mannschaft.id,mannschaft.rufname,person.id,person.nachname
FROM mannschaft,person,relevant_mannschaften WHERE mannschaft.leiter = person.id AND relevant_mannschaften.mannschaft=mannschaft.id;
This query is working basically - but in "relevant_mannschaften" I am currently selecting each mannschaft, which has been to an relevant_einsatz with at least 1 bergefahrzeug.
Instead of this, I want to select into "relevant_mannschaften" each mannschaft, which has been to an relevant_einsatz WITH EACH from bergefahrzeug.
Does anybody know how to formulate this change?
The information you provide is rather rudimentary. But tuning into my mentalist skills, going out on a limb, I would guess this untangled version of the query does the job much faster:
SELECT m.id, m.rufname, p.id, p.nachname
FROM person p
JOIN mannschaft m ON m.leiter = p.id
JOIN (
SELECT e.mannschaft
FROM einsatz e
JOIN bergefahrzeug b ON b.id = e.fahrzeug -- may be redundant
GROUP BY e.mannschaft
HAVING count(DISTINCT e.fahrzeug)
= (SELECT count(*) FROM bergefahrzeug)
) e ON e.mannschaft = m.id
Explain:
In the subquery e I count how many DISTINCT mountain-vehicles (bergfahrzeug) have been used by a team (mannschaft) in all their deployments (einsatz): count(DISTINCT e.fahrzeug)
If that number matches the count in table bergfahrzeug: (SELECT count(*) FROM bergefahrzeug) - the team qualifies according to your description.
The rest of the query just fetches details from matching rows in mannschaft and person.
You don't need this line at all, if there are no other vehicles in play than bergfahrzeuge:
JOIN bergefahrzeug b ON b.id = e.fahrzeug
Basically, this is a special application of relational division. A lot more on the topic under this related question:
How to filter SQL results in a has-many-through relation
Do not know how to explain it, but here is an example how I solved this problem, just in case somebody has the some question one day.
WITH dfz AS (
SELECT DISTINCT fahrzeug,mannschaft FROM einsatz WHERE einsatz.fahrzeug IN (SELECT id FROM bergefahrzeug)
), abc AS (
SELECT DISTINCT mannschaft FROM dfz
), einsatzmannschaften AS (
SELECT abc.mannschaft FROM abc WHERE (SELECT sum(dfz.fahrzeug) FROM dfz WHERE dfz.mannschaft = abc.mannschaft) = (SELECT sum(bergefahrzeug.id) FROM bergefahrzeug)
)
SELECT mannschaft.id,mannschaft.rufname,person.id,person.nachname
FROM mannschaft,person,einsatzmannschaften WHERE mannschaft.leiter = person.id AND einsatzmannschaften.mannschaft=mannschaft.id;

Pagination help in SQL

The below inner SELECT returns huge amount of rows (1000000+) and the outer SELECTs(alpha BETWEEN #startRec# AND #endRec#) is used for PAGINATION
to display data with 25 in each page.
Issue is:-This PAGINATION done below is very slow and slows the entire display of data.So could all please help me on doing this below
pagination in a BETTER WAY? COde about pagination would be best.
**I am very sorry to put in this way but i am very new to Pagination concepts and so need your help.
/*********ORIGINAL QUERY ****/
SELECT
*
FROM
(
SELECT
beta.*, rownum as alpha
FROM
(
SELECT
p.lastname, p.firstname, porg.DEPARTMENT,
porg.org_relationship,
porg.enterprise_name,
(
SELECT
count(*)
FROM
test_person p, test_contact c1, test_org_person porg
WHERE
p.p_id = c1.ref_id(+)
AND p.p_id = porg.o_p_id
$where_clause$
) AS results
FROM
test_person p, test_contact c1, test_org_person porg
WHERE
p.p_id = c1.ref_id(+)
AND p.p_id = porg.o_p_id
$where_clause$
ORDER BY
upper(p.lastname), upper(p.firstname)
) beta
)
WHERE
alpha BETWEEN #startRec# AND #endRec#
My tried implementation below
(1)The inner most query..is the 1st QUERY fetching the data.
(2)Then,we do a total COUNT on the above data.
Now,main issue is running the query goes on forever....and finally i have to forcibly cancel it.
I feel there is something missing in the below query for which it hangs off.
Also,I came to know doing the COUNT outside is the best approach for performance.So,could you please correct the query below so that I am able return the COUNT
*** DATA using Pagination,rownum etc.Mainly with the aliases below,rownum and getting data.
select * from
( select x.* ,rownum rnum
from ( SELECT
count(*) as results /****2nd QUERY is OUTSIDE to get total count**/
Question is here,how do i access the data selected inside the 1st query below
from ( /****1st query to SELECT data***/
SELECT
p.lastname, p.firstname, porg.DEPARTMENT,
porg.org_relationship,
porg.enterprise_name
FROM
t_person p, t_contact c1, t_o_person porg
WHERE rownum <10
and
p.person_id = c1.ref_id(+)
AND p.person_id = porg.o_person_id
ORDER BY
upper(p.lastname), upper(p.firstname)
) y ------------------>alias defined Y from data of the 1st query
)x ------------------>alias defined X
where rownum <= 20 )
where rnum >= 1
To do pagination quickly, you need to limit the query results returned. eg. in mysql you can use limit and calc_rows.
You'd have to check your DB, however it'd be easier to break those 2 into separate queries if you don't have those helper functions.
Maybe I've missed something, but have you looked into use the LIMIT and OFFSET clauses?
http://www.sql.org/sql-database/postgresql/manual/queries-limit.html
I usually do this as two separate queries, e.g.,:
-- get page of data
SELECT *
FROM
(
SELECT
p.lastname, p.firstname, porg.DEPARTMENT,
porg.org_relationship,
porg.enterprise_name
FROM
test_person p, test_contact c1, test_org_person porg
WHERE
p.p_id = c1.ref_id(+)
AND p.p_id = porg.o_p_id
$where_clause$
ORDER BY
upper(p.lastname), upper(p.firstname)
) beta
WHERE
rownum BETWEEN #startRec# AND #endRec#
--get total count
SELECT count(*) as Count
FROM
test_person p, test_contact c1, test_org_person porg
WHERE
p.p_id = c1.ref_id(+)
AND p.p_id = porg.o_p_id
$where_clause$
You could also return the total count in the first row of data in your results, like this:
SELECT null, null, null, null, null, count(*) as Count
FROM
test_person p, test_contact c1, test_org_person porg
WHERE
p.p_id = c1.ref_id(+)
AND p.p_id = porg.o_p_id
$where_clause$
UNION ALL
SELECT *
FROM
(
SELECT
p.lastname, p.firstname, porg.DEPARTMENT,
porg.org_relationship,
porg.enterprise_name, null
FROM
test_person p, test_contact c1, test_org_person porg
WHERE
p.p_id = c1.ref_id(+)
AND p.p_id = porg.o_p_id
$where_clause$
ORDER BY
upper(p.lastname), upper(p.firstname)
) beta
WHERE
rownum BETWEEN #startRec# AND #endRec#
What database are you using? If Oracle the ideas suggested by others will not work, Oracle does not support the LIMIT syntax for SQL.
For Oracle you wrap your query in this syntax:
SELECT *
FROM (SELECT a.*,
ROWNUM rnum
FROM ( [your query] ) a
WHERE ROWNUM <= [endRow] )
WHERE rnum >= [startRow]
These are specifically intended for ASP, but can be adapted without much trouble:
http://databases.aspfaq.com/database/how-do-i-page-through-a-recordset.html
Personally, I implemented the "#Temp table" stored procedure method when I recently needed a paging solution.
My suggestion is :
Create an index on test_person by lastname + firstname (in this order)
If possible, remove the upper functions (some DBs allow creating indexes using functions)
Remove the external SELECT and do the pagination in the client (not in DB)
I suspect that the internal subquery must be resolved first, and that's costly if there are no proper indexes. Usually ordering by computed columns do not use indexes, temporal tables are created etcetera.
Cheers
In Oracle there are a couple of options:
Using ROWNUM in an inner query with a wrapping to get the pagination (as you've tried)
Using analytic functions.
Both approaches have been described well by Tom Kyte:
http://www.oracle.com/technology/oramag/oracle/07-jan/o17asktom.html
Hope this helps.