SQL optimization for large data_sets

SQL optimization for large data_sets - sql

i got some nasty sql performance issue. I need to execute statment like:
SELECT *
FROM (SELECT /*+ FIRST_ROWS(26) */
a.*, ROWNUM rnum
FROM (SELECT *
FROM t1
WHERE t1_col1 = 'val1'
AND g_dom in ('1', '2', '3')
AND g_context IN ('3', '4', '5', '6')
AND i_col = 1
AND f_col in ('1', '2', '3', '4')
AND e_g IN (SELECT e_g
FROM t2
WHERE t2_col1 = 'val1'
AND g_context IN ('3', '4', '5', '6')
AND val like 'some val%')
ORDER BY order_id DESC) a)
WHERE rnum > 0;
Basically we got table t1 (our data table), and t2 (our support values). We got 1kk records in t1 and 10kk in t2. Column g_context narrows our data sets, but still, val had something like 500k records. We need 25 rows ordered by order_id.
Is there any way to tell inner statement
SELECT e_g FROM t2 WHERE t2_col1='val1' AND g_context IN('3','4','5','6' ) AND val like 'some val%
to get only 25 records that's match out outer statement criteria ?

Why not move rownum and the hint into the inner query like so:
SELECT t1.*,row_number() over (order by order_id desc) rn /*+ FIRST_ROWS(26) */
FROM t1
WHERE t1_col1 = 'val1'
AND g_dom in ('1', '2', '3')
AND g_context IN ('3', '4', '5', '6')
AND i_col = 1
AND f_col in ('1', '2', '3', '4')
AND e_g IN (SELECT e_g
FROM t2
WHERE t2_col1 = 'val1'
AND g_context IN ('3', '4', '5', '6')
AND val like 'some val%')
ORDER BY order_id DESC
Too me this extra subselect with hint and rownum does not seem to make any sense.
And the where-clause should be "WHERE rnum < 26", shouldn't it?

Related

Recursive in Sql Server

It is my test table
with test as
(select '1'as id, '2' as id_st, '324234234' as id_dd
union all
select
'1', '2', '43252345'
union all
select
'1', '2', '43252345'
union all
select
'1', '2', '43252344'
union all
select
'1', '2', '43252344323'
union all
select
'1', '2', '2345235'
union all
select
'2', null, '324234234'
union all
select
'2', null, '3432423'
union all
select
'2', null, '43252345'
union all
select
'2', null, '3432423234523'
union all
select
'2', null, '6543456'
union all
select
'2', null, '6543456754'
union all
select
'2', null, '43252344'
)
select *
into tmp_20102022v
from test;
And I was writing code for read recursive from table tmp_20102022v
WITH CTE
AS
(
SELECT distinct
M1.id id_t,
M1.id,
m1.id_dd
FROM tmp_20102022v M1
LEFT JOIN tmp_20102022v M2
ON M1.id = M2.id_st
WHERE M2.id IS NULL
UNION ALL
SELECT
C.id,
M.id_st,
m.id_dd
FROM CTE C
JOIN tmp_20102022v M
ON C.id = M.id
)
SELECT c.id_t,c.id_dd FROM CTe c
group by c.id_t,c.id_dd
Result this code:
How I need to change my code for gain result like below.
So that the result is id = 1 which includes id_dd as well as id = 1 and 2.
And id=2 which includes only id_dd for id=2:
The recursive tree can be large.

How to convert Mysql query to Hive

I have this table:
CREATE TABLE ip_logs (
`ip_address` VARCHAR(11),
`start_date` VARCHAR(11),
`end_date` VARCHAR(11),
`loc_id` INTEGER
);
INSERT INTO ip_logs
(`ip_address`,`start_date`,`end_date`, `loc_id`)
VALUES
('120.0.53.21','2020-01-03','2020-01-09', '5'),
('198.5.273.2','2020-01-10','2020-01-14', '4'),
('198.5.273.2','2020-01-10','2020-01-14', '4'),
('198.5.273.2','2020-01-10','2020-01-14', '4'),
('100.36.33.1','2020-02-01','2020-02-02', '4'),
('100.36.33.1','2020-02-01','2020-02-02', '4'),
('100.36.33.1','2020-02-01','2020-02-02', '4'),
('198.0.47.33','2020-02-22','2020-02-24', '2'),
('122.8.0.11', '2020-02-25','2020-02-30','4'),
('198.0.47.33','2020-03-10','2020-03-17', '2'),
('198.0.47.33','2020-03-10','2020-03-17', '2'),
('122.8.0.11', '2020-03-18','2020-03-23','4'),
('198.5.273.2','2020-03-04','2020-03-09', '3'),
('106.25.12.2','2020-03-24','2020-03-30', '1');
I use this query to select the most frequent ip address:
select (
select ip_address
from ip_logs t2
where t2.loc_id = t1.loc_id
group by ip_address
order by count(*) desc
limit 1)
from ip_logs t1
group by loc_id
This works in mysql8.0. This however does not work in Hive as well. I get this error:
cannot recognize input near 'select' 'ip_address' 'from' in expression specification
Expected output is:
loc_id | ip_address
5 120.0.53.21
4 198.5.273.2
2 198.0.47.33
3 198.5.273.2
1 106.25.12.2

You can try using row_number() window function
select * from
(
select ip_address,loc_id,count(*) as frequency
row_number() over(partition by loc_id order by count(*) desc) as rn
from ip_logs group by ip_address,loc_id
)A where rn=1

Error in Conditional UNION

I have this SQL:
SELECT MONTH(data) AS MES, cor, CASE month(data)
WHEN 1 THEN 'Janeiro' WHEN 2 THEN 'Fevereiro' WHEN 3 THEN 'Março' WHEN 4 THEN 'Abril' WHEN 5 THEN 'Maio' WHEN 6 THEN 'Junho' WHEN 7 THEN 'Julho' WHEN
8 THEN 'Agosto' WHEN 9 THEN 'Setembro' WHEN 10 THEN 'Outubro' WHEN 11 THEN 'Novembro' WHEN 12 THEN 'Dezembro' END AS MESCOR, COUNT(*)
AS Expr1, CASE cor WHEN 'AM' THEN '2' WHEN 'VD' THEN '1' WHEN 'VM' THEN '3' END AS Expr2
FROM TBINICIATIVAS_PREVISTAS
WHERE (login ='xxxxxxx')
GROUP BY MONTH(data), cor
UNION
SELECT '1', 'VD', 'Janeiro',0,'1'
UNION
SELECT '1', 'AM', 'Janeiro',0,'2'
UNION
SELECT '1', 'VM', 'Janeiro',0,'3'
ORDER BY expr2, mes
But I need the UNION to be conditional.
Something like:
if (select count(*) .... where cond1...) = 0
UNION
SELECT '1', 'VD', 'Janeiro',0,'1'
if (select count(*) ....where cond2...) = 0
UNION
SELECT '1', 'AM', 'Janeiro',0,'2'
if (select count(*) ....where cond3...) = 0
UNION
SELECT '1', 'VM', 'Janeiro',0,'3'
I tried, but I always got a syntax error.
Is it possible to do that?

Move your conditions to a WHERE clause inside each UNION's SELECT (you can't put anything other than a SELECT between UNION).
SELECT '1', 'VD', 'Janeiro',0,'0'
UNION
SELECT '1', 'VD', 'Janeiro',0,'1'
WHERE (select count(*) ....where cond1...) = 0
UNION
SELECT '1', 'AM', 'Janeiro',0,'2'
WHERE (select count(*) ....where cond2...) = 0
UNION
SELECT '1', 'VM', 'Janeiro',0,'3'
WHERE (select count(*) ....where cond3...) = 0
Also, I note from your conditions that you are checking if the row exists, by doing a count = 0. Replacing this with an actual NOT EXISTS will be faster because the SQL engine doesn't have to actually count all rows to compare against 0, it will return immediately when a row is found.
SELECT '1', 'VD', 'Janeiro',0,'0'
UNION
SELECT '1', 'VD', 'Janeiro',0,'1'
WHERE NOT EXISTS (select 1 ....where cond1...)
UNION
SELECT '1', 'AM', 'Janeiro',0,'2'
WHERE NOT EXISTS (select 1 ....where cond2...)
UNION
SELECT '1', 'VM', 'Janeiro',0,'3'
WHERE NOT EXISTS (select 1 ....where cond3...)

SQL - How to retrieve the correct number

I have written a personal pseudocode to help others understand what I am trying to achieve. I am a beginner in SQL but I know all the basics, just not experienced enough with the possiblilities.
I tried using SELECT CASE but it doesn't achieve what I need.
Here is my sample data
CREATE TABLE Records
([ColA] INTEGER, [ColB] INTEGER, [ColTotal] INTEGER)
;
INSERT INTO Records
([ColA], [ColB] )
VALUES
('3', '4'),
('4', '2’),
('1', '2'),
('3', '5'),
('3', '1'),
('2', '2')
;
Here is my PSEUDOCODE (I found out after accepted answer that my logic in the psuedo was incorrect too. This has been fixed)
SELECT COL A, COL B, COL TOTAL
IF COL A >= COLB THEN
ADD COLB value to COL TOTAL
ELSE
USE Value from COLA and add to total
END IF
Here is my SQL
SELECT SUM(ColTotal)
FROM t
WHERE ColA >= ColB

Is this what you want?
select sum(case when a < b then a else b end) as total
from t;
Many databases support the least() function, which makes this simpler:
select sum(least(a, b)) as total
from t;

SQL - ALL, Including all values

I have two tables:
create table xyz
(campaign_id varchar(10)
,account_number varchar)
Insert into xyz
values ( 'A', '1'), ('A', '5'), ('A', '7'), ('A', '9'), ('A', '10'),
( 'B', '2'), ('B', '3'),
( 'C', '1'), ('C', '2'), ('C', '3'), ('C', '5'), ('C', '13'), ('C', '15'),
('D', '2'), ('D', '9'), ('D', '10')
create table abc
(account_number varchar)
insert into abc
values ('1'), ('2'), ('3'), ('5')
Now, I want to write a query where all the four account_number 1, 2, 3, 5 are included in a Campaign_id.
The answer is C.
[My aim is to find the Campaign Code that includes account_number 1, 2, 3 & 5. This condition is only satisfied by campaign code C.]
I tried using IN and ALL, but don't work. Could you please help.

I think what you are after is a inner join. Not sure from your questions which way around you want your data. However this should give you a good clue how to procede and what keywords to lock for in the documentation to go further.
SELECT a.*
FROM xyz a
INNER JOIN abc b ON b.account_number = a.account_number;
EDIT:
Seems I misunderstood the original question.. sorry. To get what you want you can just do:
SELECT campaign_id
FROM xyz
WHERE account_number IN ('1', '2', '3', '5')
GROUP BY campaign_id
HAVING COUNT(DISTINCT account_number) = 4;
This is called relational division if you want to investigate further.

SELECT campaign_id
FROM (
SELECT campaign_id, COUNT(*) AS c, total_accounts
FROM xyz
JOIN abc ON xyz.account_number = abc.account_number
CROSS JOIN (SELECT COUNT(*) AS total_accounts
FROM abc) AS x
GROUP BY campaign_id
HAVING c = total_accounts) AS subq
DEMO

select xyz.campaign_id
from xyz
join abc
on xyz.account_number = abc.account_number
group by xyz.campaign_id
having count(xyz.campaign_id) =
(select count(account_number) from abc);
Caution: t-sql implementation

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL optimization for large data_sets - sql

Related

Recursive in Sql Server

How to convert Mysql query to Hive

Error in Conditional UNION

SQL - How to retrieve the correct number

SQL - ALL, Including all values

Categories

Resources