Is it possible to do complex SQL queries using Django? - sql

I have the following Script to get a list of calculated index for each day after specific date:
with test_reqs as (
select id_test, date_request, sum(n_requests) as n_req from cdr_test_stats
where
id_test in (2,4) and -- List of Ids included in index calc
date_request >= 20170823 -- Start date (end date -> Last in DB -> Today)
group by id_test, date_request
),
date_reqs as (
select date_request, sum(n_req) as n_req
from test_reqs
group by date_request
),
test_reqs_ratio as (
select H.id_test, H.date_request,
case when D.n_req = 0 then null else H.n_req/D.n_req end as ratio_req
from test_reqs H
inner join date_reqs D
on H.date_request = D.date_request
),
test_reqs_index as (
select HR.*, least(nullif(HA.n_dates_hbalert, 0), 10) as index_hb
from test_reqs_ratio HR
left join cdr_test_alerts_stats HA
on HR.id_test = HA.id_test and HR.date_request = HA.date_request
)
select date_request, 10-sum(ratio_req*index_hb) as index_hb
from test_reqs_index
group by date_request
Result:
---------------------------
| date_request | index_hb |
---------------------------
| 20170904 | 7.5508 |
| 20170905 | 7.6870 |
| 20170825 | 7.4335 |
| 20170901 | 7.7116 |
| 20170824 | 1.6568 |
| 20170823 | 0.0000 |
| 20170903 | 5.1850 |
| 20170830 | 0.0000 |
| 20170828 | 0.0000 |
---------------------------
The problem is that I want to get the same in Django and avoid to execute the raw query using the cursor.
Many thanks for any suggestion.

Without going deep into the specifics of your query, I'd say the Django ORM has enough expressiveness to handle most problems, but generally, would require you to redesign the query from the ground up. You would have to use subqueries and joins instead of the CTE's, and you might end up with a solution that does some of the work in Python land instead of the DB.
Taking this into account the answer is: depends. Your functional requirements, such as performance and data size play a role.
Another solution worth considering is declaring your SQL query as a view, and at least in the case of Postgres, use something like django-pgviews to query it with Django ORM almost as if it were a model.

Related

How to query sum total of transitively linked child transactions from database?

I got this one assignment which has a lot of weird stuff to do. I need to create an API for storing transaction details and do some operations. One such operation involves retrieving a sum of all transactions that are transitively linked by their parent_id to $transaction_id.
If A is the parent of B and C, and C is the parent of D and E, then
sum(A) = A + B + C + D + E
note: not just immediate child transactions.
I have this sample data in the SQL database as given below.
MariaDB [test_db]> SELECT * FROM transactions;
+------+-------+----------+---------+
| t_id | t_pid | t_amount | t_type |
+------+-------+----------+---------+
| 1 | NULL | 10000.00 | default |
| 2 | NULL | 25000.00 | cars |
| 3 | 1 | 30000.00 | bikes |
| 4 | NULL | 10000.00 | bikes |
| 5 | 3 | 15000.00 | bikes |
+------+-------+----------+---------+
5 rows in set (0.000 sec)
MariaDB [test_db]>
where t_id is a unique transaction_id and t_pid is a parent_id which is either null or an existing t_id.
so, when I say sum(t_amount) where t_id=1, I want the result to be
sum(1+3+5) -> sum(10000 + 30000 + 15000) = 55000.
I know I can achieve this in a programmatic way with some recursion which will do repeated query operations and add the sum. But, that will give me poor performance if the data is very large say, millions of records.
I want to know if there is any possibility of achieving this with a complex query. And if yes, then how to do it?
I have very little knowledge and experience with databases. I tried with what I know and I couldn't do it. I tried searching for any similar queries available here and I didn't find any.
With what I have researched, I guess I can achieve this with stored procedures and using the HAVING clause. Let me know if I am right there and help me do this.
So, any sort of help will be appreciated.
Thanks in advance.
You need a recursive CTE:
with recursive cte as (
select t_id as ultimate_id, t_id, t_amount
from tranctions t
where t_id = 1
union all
select cte.ultimate_id, t.t_id, t.amount
from cte join
transactions tc
on tc.p_id = cte.t_id
)
select ultimate_id, sum(t_amount)
from cte
group by ultimate_id;

SQL structure for multiple queries of the same table (using window function, case, join)

I have a complex production SQL question. It's actually PrestoDB Hadoop, but conforms to common SQL.
I've got to get a bunch of metrics from a table, a little like this (sorry if the tables are mangled):
+--------+--------------+------------------+
| device | install_date | customer_account |
+--------+--------------+------------------+
| dev 1 | 1-Jun | 123 |
| dev 1 | 4-Jun | 456 |
| dev 1 | 10-Jun | 789 |
| dev 2 | 20-Jun | 50 |
| dev 2 | 25-Jun | 60 |
+--------+--------------+------------------+
I need something like this:
+--------+------------------+-------------------------+
| device | max_install_date | previous_account_number |
+--------+------------------+-------------------------+
| dev 1 | 10-Jun | 456 |
| dev 2 | 25-Jun | 50 |
+--------+------------------+-------------------------+
I can do two separate queries to get max install date and previous account number, like this:
select device, max(install_date) as max_install_date
from (select [a whole bunch of stuff], dense_rank() over(partition by device order by [something_else]) rnk
from some_table a
)
But how do you combine them into one query to get one line for each device? I have rank, with statements, case statements, and one join. They all work individually but I'm banging my head to understand how to combine them all.
I need to understand how to structure big queries.
ps. any good books you recommend on advanced SQL for data analysis? I see a bunch on Amazon but nothing that tells me how to construct big queries like this. I'm not a DBA. I'm a data guy.
Thanks.
You can use correlated subquery approach :
select t.*
from table t
where install_date = (select max(install_date) from table t1 where t1.device = t.device);
This assumes install_date has resonbale date format.
I think you want:
select t.*
from (select t.*, max(install_date) over (partition by device) as max_install_date,
lag(customer_account) over (partition by device order by install-date) as prev_customer_account
from t
) t
where install_date = max_install_date;

SQL script runs VERY slowly with small change

I am relatively new to SQL. I have a script that used to run very quickly (<0.5 seconds) but runs very slowly (>120 seconds) if I add one change - and I can't see why this change makes such a difference. Any help would be hugely appreciated!
This is the script and it runs quickly if I do NOT include "tt2.bulk_cnt
" in line 26:
with bulksum1 as
(
select t1.membercode,
t1.schemecode,
t1.transdate
from mina_raw2 t1
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t1.membercode,
t1.schemecode,
t1.transdate
),
bulksum2 as
(
select t1.schemecode,
t1.transdate,
count(*) as bulk_cnt
from bulksum1 t1
group by t1.schemecode,
t1.transdate
having count(*) >= 10
),
results as
(
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join bulksum2 tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
)
select * from results
EDIT: I apologise for not putting enough detail in here previously - although I can use basic SQL code, I am a complete novice when it comes to databases.
Database: Oracle (I'm not sure which version, sorry)
Execution plans:
QUICK query:
Plan hash value: 1712123489
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | VIEW | |
| 3 | FILTER | |
| 4 | HASH GROUP BY | |
| 5 | VIEW | VM_NWVW_0 |
| 6 | HASH GROUP BY | |
| 7 | TABLE ACCESS FULL| MINA_RAW2 |
| 8 | TABLE ACCESS FULL | MINA_RAW2 |
---------------------------------------------
SLOW query:
Plan hash value: 1298175315
--------------------------------------------
| Id | Operation | Name |
--------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | FILTER | |
| 2 | HASH GROUP BY | |
| 3 | HASH JOIN | |
| 4 | VIEW | VM_NWVW_0 |
| 5 | HASH GROUP BY | |
| 6 | TABLE ACCESS FULL| MINA_RAW2 |
| 7 | TABLE ACCESS FULL | MINA_RAW2 |
--------------------------------------------
A few observations, and then some things to do:
1) More information is needed. In particular, how many rows are there in the MINA_RAW2 table, what indexes exist on this table, and when was the last time it was analyzed? To determine the answers to these questions, run:
SELECT COUNT(*) FROM MINA_RAW2;
SELECT TABLE_NAME, LAST_ANALYZED, NUM_ROWS
FROM USER_TABLES
WHERE TABLE_NAME = 'MINA_RAW2';
From looking at the plan output it looks like the database is doing two FULL SCANs on MINA_RAW2 - it would be nice if this could be reduced to no more than one, and hopefully none. It's always tough to tell without very detailed information about the data in the table, but at first blush it appears that an index on TRANSACTIONTYPE might be helpful. If such an index doesn't exist you might want to consider adding it.
2) Assuming that the statistics are out-of-date (as in, old, nonexistent, or a significant amount of data (> 10%) has been added, deleted, or updated since the last analysis) run the following:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(owner => 'YOUR-SCHEMA-NAME',
table_name => 'MINA_RAW2');
END;
substituting the correct schema name for "YOUR-SCHEMA-NAME" above. Remember to capitalize the schema name! If you don't know if you should or shouldn't gather statistics, err on the side of caution and do it. It shouldn't take much time.
3) Re-try your existing query after updating the table statistics. I think there's a fair chance that having up-to-date statistics in the database will solve your issues. If not:
4) This query is doing a GROUP BY on the results of a GROUP BY. This doesn't appear to be necessary as the initial GROUP BY doesn't do any grouping - instead, it appears this is being done to get the unique combinations of MEMBERCODE, SCHEMECODE, and TRANSDATE so that the count of the members by scheme and date can be determined. I think the whole query can be simplified to:
WITH cteWORKING_TRANS AS (SELECT *
FROM MINA_RAW2
WHERE TRANSACTIONTYPE IN ('RSP','SP','UNTV',
'ASTR','CN','TVIN',
'UCON','TRAS')),
cteBULKSUM AS (SELECT a.SCHEMECODE,
a.TRANSDATE,
COUNT(*) AS BULK_CNT
FROM (SELECT DISTINCT MEMBERCODE,
SCHEMECODE,
TRANSDATE
FROM cteWORKING_TRANS) a
GROUP BY a.SCHEMECODE,
a.TRANSDATE)
SELECT t.*, b.BULK_CNT
FROM cteWORKING_TRANS t
INNER JOIN cteBULKSUM b
ON b.SCHEMECODE = t.SCHEMECODE AND
b.TRANSDATE = t.TRANSDATE
I managed to remove an unnecessary subquery, but this syntax with distinct inside count may not work outside of PostgreSQL or may not be the desired result. I know I've certainly used it there.
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join (select t2.schemecode,
t2.transdate,
count(DISTINCT membercode) as bulk_cnt
from mina_raw2 t2
where t2.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t2.schemecode,
t2.transdate
having count(DISTINCT membercode) >= 10) tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
When you use those with queries, instead of subqueries when you don't need to, you're kneecapping the query optimizer.

SQL Query: Search with list of tuples

I have a following table (simplified version) in SQLServer.
Table Events
-----------------------------------------------------------
| Room | User | Entered | Exited |
-----------------------------------------------------------
| A | Jim | 2014-10-10T09:00:00 | 2014-10-10T09:10:00 |
| B | Jim | 2014-10-10T09:11:00 | 2014-10-10T09:22:30 |
| A | Jill | 2014-10-10T09:00:00 | NULL |
| C | Jack | 2014-10-10T09:45:00 | 2014-10-10T10:00:00 |
| A | Jack | 2014-10-10T10:01:00 | NULL |
.
.
.
I need to create a query that returns person's whereabouts in given timestamps.
For an example: Where was (Jim at 2014-10-09T09:05:00), (Jim at 2014-10-10T09:01:00), (Jill at 2014-10-10T09:10:00), ...
The result set must contain the given User and Timestamp as well as the found room (if any).
------------------------------------------
| User | Timestamp | WasInRoom |
------------------------------------------
| Jim | 2014-10-09T09:05:00 | NULL |
| Jim | 2014-10-09T09:01:00 | A |
| Jim | 2014-10-10T09:10:00 | A |
The number of User-Timestamp tuples can be > 10 000.
The current implementation retrieves all records from Events table and does the search in Java code. I am hoping that I could push this logic to SQL. But how?
I am using MyBatis framework to create SQL queries so the tuples can be inlined to the query.
The basic query is:
select e.*
from events e
where e.user = 'Jim' and '2014-10-09T09:05:00' >= e.entered and ('2014-10-09T09:05:00' <= e.exited or e.exited is NULL) or
e.user = 'Jill' and '2014-10-10T09:10:00 >= e.entered and ('2014-10-10T09:10:00' <= e.exited or e.exited is NULL) or
. . .;
SQL Server can handle ridiculously large queries, so you can continue in this vein. However, if you have the name/time values in a table already (or it is the result of a query), then use a join:
select ut.*, t.*
from usertimes ut left join
events e
on e.user = ut.user and
ut.thetime >= et.entered and (ut.thetime <= exited or ut.exited is null);
Note the use of a left join here. It ensures that all the original rows are in the result set, even when there are no matches.
Answers from Jonas and Gordon got me on track, I think.
Here is query that seems to do the job:
CREATE TABLE #SEARCH_PARAMETERS(User VARCHAR(16), "Timestamp" DATETIME)
INSERT INTO #SEARCH_PARAMETERS(User, "Timestamp")
VALUES
('Jim', '2014-10-09T09:05:00'),
('Jim', '2014-10-10T09:01:00'),
('Jill', '2014-10-10T09:10:00')
SELECT #SEARCH_PARAMETERS.*, Events.Room FROM #SEARCH_PARAMETERS
LEFT JOIN Events
ON #SEARCH_PARAMETERS.User = Events.User AND
#SEARCH_PARAMETERS."Timestamp" > Events.Entered AND
(Events.Exited IS NULL OR Events.Exited > #SEARCH_PARAMETERS."Timestamp"
DROP TABLE #SEARCH_PARAMETERS
By declaring a table valued parameter type for the (user, timestamp) tuples, it should be simple to write a table valued user defined function which returns the desired result by joining the parameter table and the Events table. See http://msdn.microsoft.com/en-us/library/bb510489.aspx
Since you are using MyBatis it may be easier to just generate a table variable for the tuples inline in the query and join with that.

SQL Query converting to Rails Active Record Query Interface

I have been using sql queries in my rails code which needs to be transitioned to Active Record Query. I haven't used Active Record before so i tried going through http://guides.rubyonrails.org/active_record_querying.html to get the proper syntax to be able to switch to this method of getting the data. I am able to convert the simple queries into this format but there are other complex queries like
SELECT b.owner,
Sum(a.idle_total),
Sum(a.idle_monthly_usage)
FROM market_place_idle_hosts_summaries a,
(SELECT DISTINCT owner,
hostclass,
week_number
FROM market_place_idle_hosts_details
WHERE week_number = '#{week_num}'
AND Year(updated_at) = '#{year_num}') b
WHERE a.hostclass = b.hostclass
AND a.week_number = b.week_number
AND Year(updated_at) = '#{year_num}'
GROUP BY b.owner
ORDER BY Sum(a.idle_monthly_usage) DESC
which i need in Active Record format but because of the complexity I am stuck as to how to proceed with the conversion.
The output of the query is something like this
+----------+-------------------+---------------------------+
| owner | sum(a.idle_total) | sum(a.idle_monthly_usage) |
+----------+-------------------+---------------------------+
| abc | 485 | 90387.13690185547 |
| xyz | 815 | 66242.01857376099 |
| qwe | 122 | 11730.609939575195 |
| asd | 80 | 9543.170425415039 |
| zxc | 87 | 8027.090087890625 |
| dfg | 67 | 7303.070011138916 |
| wqer | 76 | 5234.969814300537 |
Instead of converting it to an active record, you can use the find_by_sql method. Since your query is a bit complex.
You can use also use ActiveRecord::Base.connection, directly to fetch the records.
like this,
ActiveRecord::Base.connection.execute("your query")
You can create the subquery apart with ActiveRecord and convert it to sql using to_sql
Then use joins to join your table a with the b one, that it is the subquery. Note also the use of the active record clauses select, where, group and order that are basically what you need to build this complex SQL query in ActiveRecord.
Something similar to the following will work:
subquery = SubModel.select("DISTINCT ... ").where(" ... ").to_sql
Model.select("b.owner, ... ")
.joins("JOIN (#{subquery}) b ON a.hostclass = b.hostclass")
.where(" ... ")
.group("b.owner")
.order("Sum(a.idle_monthly_usage) DESC")