SQL Query Comparison Processing Efficiency, Any Better Solution?

SQL Query Comparison Processing Efficiency, Any Better Solution? - sql

I'm working in large set of data about 134 million line i would like to make a select query with a insert in a table.
This is my table SQL script (SQL Fiddle).
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Id | Emitter | EmitterIBAN | Receiver | ReceiverIBAN | Adresss | Value
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1, Ernst, HR53 8827 2118 4692 8207 5, Kimbra, CH20 1042 6T0N MDTG JT47 U, 3256 Arrowood Point 0002, 121.72
2, Keene, SK81 1004 7484 7505 6308 9259, Torrance, RO23 ZWTR OJKK VAU9 T5P4 2GDY, 35197 Green Ridge Way, 82.52
3, Ernst, HR53 8827 2118 4692 8207 5, Kimbra, CH20 1042 6T0N MDTG JT47 U, 3256 Arrowood Point 0048, 51.81
4, Korie, ME43 9833 9830 7367 4239 60,Roy, IL69 9686 1536 8102 2219 165, 5 Swallow Alley, 88.01
5, Ernst, HR53 8827 2118 4692 8207 5, Kimbra, CH20 1042 6T0N MDTG JT47 U, 3256 Arrowood Point 0001, 133.99
6, Charmine, BG92 TOXX 8380 785I JKRQ JS, Sarette, MU67 RYRU 9293 5875 6859 7111 075X HR, 8 Sage Place, 36.30
7, Ernst, HR53 8827 2118 4692 8207 5, Kimbra, CH20 1042 6T0N MDTG JT47 U, 3256 Arrowood Point 0004, 186.99
And i select my data with this query
Select count(1) as NumberOperation,
MAX(Emitter) as EmitterName,
EmitterIban,
MAX(Receiver) as ReceiverName,
ReceiverIban,
MAX(ReceiverAddress) as ReceiverAddress,
SUM([Value]) as SumValues
FROM TableEsperadoceTransaction
Group By EmitterIban,
ReceiverIban
And i get the following result
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NumberOperation | Emitter | EmitterIBAN | Receiver | ReceiverIBAN | Adresss | SumValue
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4, Ernst, HR53 8827 2118 4692 8207 5, Kimbra, CH20 1042 6T0N MDTG JT47 U, 3256 Arrowood Point 0002, 494,51
1, Keene, SK81 1004 7484 7505 6308 9259, Torrance, RO23 ZWTR OJKK VAU9 T5P4 2GDY, 35197 Green Ridge Way, 82.52
1, Korie, ME43 9833 9830 7367 4239 60,Roy, IL69 9686 1536 8102 2219 165, 5 Swallow Alley, 88.01
1, Charmine, BG92 TOXX 8380 785I JKRQ JS, Sarette, MU67 RYRU 9293 5875 6859 7111 075X HR, 8 Sage Place, 36.30
I also have this solution
SELECT DISTINCT *
FROM (SELECT Count(1) AS NumberOperation,
emitteriban AS _EmitterIban,
receiveriban AS _ReceiverIban,
Sum([value]) AS SumValues
FROM tableesperadocetransaction
GROUP BY emitteriban,
receiveriban) tmp_T
LEFT JOIN tableesperadocetransaction
ON tableesperadocetransaction.emitteriban = tmp_T._emitteriban
AND tableesperadocetransaction.receiveriban =
tmp_T._receiveriban
And i would like to know what's the best solution between this two and if there's query more efficient than that?
Thanks

The second query is slower because:
It has a LEFT JOIN
It has a sub-query
It has a SELECT DISTINCT
Has a * instead of column names
The first one is the most natural way of doing this.
There is a lot about how to improve performance of queries and what to avoid. See for example: MSDN on improving queries

The 1st query should be far more efficient.
If you really want to speed things up, you'll want to make sure you have a covering index with EmitterIban, ReceiverIban as the key.

You can try this.
You get MIN(id), after use it for INNER JOIN. That is also a way.
SELECT
tmp.NumberOperation
,tb.Emitter
,tmp.EmitterIban
,tb.Receiver
,tmp.ReceiverIban
,tb.Adresss
,tmp.SumValues
FROM (SELECT Count(1) AS NumberOperation,
emitteriban AS EmitterIban,
receiveriban AS ReceiverIban,
Sum([value]) AS SumValues,
MIN(Id) AS Id
FROM tableesperadocetransaction
GROUP BY emitteriban,
receiveriban) tmp
INNER JOIN tableesperadocetransaction tb
ON tableesperadocetransaction.id = tmp.Id

Related

How can select only values that similar in several columns but different in several others?

I have two tables from different databases, and I need to create a report, where there is need to see discrepancy in data:
Table A:
DATE
FLIGHT
AC
DEST
ATD
TDN
14.01.2022
150
AIRB
JFK
02:45
1:35
15.01.2022
152
BOEING
MIA
02:45
1:38
15.01.2022
145
AIRB
SEA
01:25
01:05
Table B:
DATE
FLIGHT
AC
DEST
ATD
TDN
14.01.2022
150
AIRB
JFK
02:45
1:35
15.01.2022
152
BOEING
MIA
02:39
1:38
15.01.2022
145
AIRB
SEA
01:28
01:15
The result should be only rows different in last two columns:
DATE
FLIGHT
AC
DEST
ATD_B
TDN_B
ATD_A
TDN_A
15.01.2022
152
BOEING
MIA
02:39
1:38
02:45
01:38
15.01.2022
145
AIRB
SEA
01:28
01:15
01:25
01:05
Now we can see where discrepancy is.
I have tried
select * from table_a
minus
select * from table_b
But it seems not the right approach

You can - for your sample data - JOIN both tables on the four identic columns and then set a condition that the combination of the two other columns must differ:
SELECT
a.a_date, a.flight, a.ac, a.dest,
b.atd AS atd_b, b.tdn AS tdn_b,
a.atd AS atd_a, a.tdn AS tdn_a
FROM
a JOIN b
ON
a.a_date = b.b_date
AND a.flight = b.flight
AND a.ac = b.ac
AND a.dest = b.dest
WHERE NOT (a.atd = b.atd AND a.tdn = b.tdn);
Sidenote: The condition in the WHERE clause can of course also be added in the join clause and the where clause can be removed.
But this will do only as long as the data is in the same form as your example.
Let's assume the are two further different entries in both table A and B for flight 145. Then I guess you would expect two rows for that flight, but in real life, it will be four.
You can replicate this behaviour here:
db<>fiddle
That's why people are asking you which rows should be put together. If you want to get two rows only for that case, you need to tell us how. If it's really intended to see all four rows, this query will work.

Just join the tables by columns and conditions from your Where clause mentioned in comments:
WITH
A (FLIGHT_DATE, FLIGHT, AC, DEST, ATD, TDN) AS
(
Select '14-JAN-22', 150, 'AIRB', 'JFK', '02:45', '1:35' From Dual Union All
Select '15-JAN-22', 152, 'BOEING', 'MIA', '02:45', '1:38' From Dual Union All
Select '15-JAN-22', 145, 'AIRB', 'SEA', '02:45', '1:05' From Dual
),
B (FLIGHT_DATE, FLIGHT, AC, DEST, ATD, TDN) AS
(
Select '14-JAN-22', 150, 'AIRB', 'JFK', '02:45', '1:35' From Dual Union All
Select '15-JAN-22', 152, 'BOEING', 'MIA', '02:39', '1:38' From Dual Union All
Select '15-JAN-22', 145, 'AIRB', 'SEA', '01:28', '1:05' From Dual
)
Select
A.FLIGHT_DATE, A.FLIGHT, A.AC, A.DEST, B.ATD "ATD_B", B.TDN "TDN_B", A.ATD "ATD_A", A.TDN "TDN_A"
From
A
Inner Join
B ON(
(A.FLIGHT_DATE = B.FLIGHT_DATE And A.FLIGHT = B.FLIGHT And A.AC = B.AC And A.DEST = B.DEST)
AND
(A.ATD <> B.ATD OR A.TDN <> B.TDN)
)
/* R e s u l t :
FLIGHT_DATE FLIGHT AC DEST ATD_B TDN_B ATD_A TDN_A
----------- ---------- ------ ---- ----- ----- ----- -----
15-JAN-22 152 BOEING MIA 02:39 1:38 02:45 1:38
15-JAN-22 145 AIRB SEA 01:28 1:05 02:45 1:05
*/
Regards...

how to select a value based on multiple criteria

I'm trying to select some values based on some proprietary data, and I just changed the variables to reference house prices.
I am trying to get the total offers for houses where they were sold at the bid or at the ask price, with offers under 15 and offers * sale price less than 5,000,000.
I then want to get the total number of offers for each neighborhood on each day, but instead I'm getting the total offers across each neighborhood (n1 + n2 + n3 + n4 + n5) across all dates and the total offers in the dataset across all dates.
My current query is this:
SELECT DISTINCT(neighborhood),
DATE(date_of_sale),
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`
WHERE ((offers * accepted_sale_price < 5000000)
AND (offers < 15)
AND (house_bid = sale_price OR
house_ask = sale_price))) as bid_ask_off,
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`) as
total_offers,
FROM `big_query.a_table_name.houseprices`
GROUP BY neighborhood, DATE(date_of_sale) LIMIT 100
Which I am expecting a result like, with date being repeated throughout as d1, d2, d3, etc.:
but am instead receiving
I'm aware that there are some inherent problems with what I'm trying to select / group, but I'm not sure what to google or what tutorials to look at in order to perform this operation.
It's querying quite a bit of data, and I want to keep costs down, as I've already racked up a smallish bill on queries.
Any help or advice would be greatly appreciated, and I hope I've provided enough information.
Here is a sample dataframe.
neighborhood date_of_sale offers accepted_sale_price house_bid house_ask
bronx 4/1/2022 3 323 320 323
manhattan 4/1/2022 4 244 230 244
manhattan 4/1/2022 8 856 856 900
queens 4/1/2022 15 110 110 135
brooklyn 4/2/2022 12 115 100 115
manhattan 4/2/2022 9 255 255 275
bronx 4/2/2022 6 330 300 330
queens 4/2/2022 10 405 395 405
brooklyn 4/2/2022 4 254 254 265
staten_island 4/3/2022 2 442 430 442
staten_island 4/3/2022 13 195 195 225
bronx 4/3/2022 4 650 650 690
manhattan 4/3/2022 2 286 266 286
manhattan 4/3/2022 6 356 356 400
staten_island 4/4/2022 4 361 361 401
staten_island 4/4/2022 5 348 348 399
bronx 4/4/2022 8 397 340 397
manhattan 4/4/2022 9 333 333 394
manhattan 4/4/2022 11 392 325 392

I think that this is what you need.
As we group by neighbourhood we do not need DISTINCT.
We take sum(offers) for total_offers directly from the table and bids from a sub-query which we join to so that it is grouped by neighbourhood.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY neighborhood) s
ON h.neighborhood = s.neighborhood
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;
Or the following which modifies more the initial query but may be more like what you need.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
date_of_sale dos,
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY
neighborhood,
date_of_sale) s
ON h.neighborhood = s.neighborhood
AND h.date_of_sale = s.dos
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;

Running Sum between dates on group by clause

I have the following query which shows the first 3 columns:
select
'Position Date' = todaypositiondate,
'Realized LtD SEK' = round(sum(realizedccy * spotsek), 0),
'Delta Realized SEK' = round(sum(realizedccy * spotsek) -
(SELECT sum(realizedccy*spotsek)
FROM t1
WHERE todaypositiondate = a.todaypositiondate - 1
GROUP BY todaypositiondate), 0)
FROM
t1 AS a
GROUP BY
todaypositiondate
ORDER BY
todaypositiondate DESC
Table:
Date | Realized | Delta | 5 day avg delta
-------------------------------------------------------------------
2016-09-08 | 696 981 323 | 90 526 | 336 611
2016-09-07 | 696 890 797 | 833 731 | 335 232
2016-09-06 | 696 057 066 | 85 576 | 84 467
2016-09-05 | 695 971 490 | 86 390 | 83 086
2016-09-04 | 695 885 100 | 81 434 | 80 849
2016-09-03 | 695 803 666 | 81 434 | 78 806
2016-09-02 | 695 722 231 | 79 679 | 74 500
2016-09-01 | 695 642 553 | 75 305 |
2016-08-31 | 695 567 248 | 68 515 |
How do I create the 5d average of delta realized?
Based on delta I tried the following but it did not work:
select
todaypositiondate,
'30d avg delta' = (select sum(realizedccy * spotsek)
from T1
where todaypositiondate between a.todaypositiondate and a.todaypositiondate -5
group by todaypositiondate)
from
T1 as a
group by
todaypositiondate
order by
todaypositiondate desc

Do not use single quotes for column names. Only use single quotes for string and date literals.
I would write this as:
with t as (
select todaypositiondate as PositionDate,
round(sum(realizedccy * spotsek), 0) as RealizedSEK,
from t1 a
group by todaypositiondate
)
select a.*,
(a.RealizedSEK - a_prev.RealizedSEK) as diff_1,
(a.RealizedSEK - a_prev5.RealizedSEK)/5 as avg_diff_5
from a outer apply
(select top 1 a_prev.*
from a a_prev
where a_prev = a.PositionDate - 1
) a_prev outer apply
(select top 1 a_prev.*
from a a_prev
where a_prev = a.PositionDate - 5
) a_prev5;
Note that the 5 day average difference is the most recent value minus the value from 6 days ago divided by 5.

I already have that kind of formula when I caluclate Delta between 2 dates.
It's like this:
Select todaypositiondate,
'D_RealizedSEK' = round(sum(realizedccy*spotsek) -
(SELECT sum(realizedccy*spotsek)
FROM T1
WHERE todaypositiondate = a.todaypositiondate - 1
GROUP BY todaypositiondate),0)
FROM T1 AS a
group by todaypositiondate
J
Instead of adding 5 formulas and just replaceing -1 with -2, -3... I would like to find away to select the average sum of all realicedccy from the previous 5 days, eventually adding them together and divide by 5.

How to create a query on an existing table and build a table(view) with aggregated data and a restriction?

What I have is an MS-SQL database that I use to store data/info coming from equipment that is mounted in some vehicles (1-3 devices per vehicle).
For the moment, there is a table in the database named DeviceStatus - a big table used to store every information from the equipment when they connect to the TCP-server. Records are added (sql INSERT) or updated (sql UPDATE) here.
The table looks like this:
Sample data:
1040 305 3 8.00 0
1044 305 2 8.00 0
1063 305 1 8.01 1.34
1071 312 2 8.00 0
1075 312 1 8.00 1.33
1078 312 3 8.00 0
1099 414 3 8.00 0
1106 414 2 8.01 0
1113 102 1 8.01 1.34
1126 102 3 8.00 0
Remark: The driver console is always related to the device installed on first position (it's an extension of Device on Position 1; obvioulsly there's only one console per vehicle) - so, this will be some sort of restriction in order to have the correct info in the desired table(view) presented below :).
What I need is a SQL query (command/statement) to create a table(view) for a so-called "Software Versions Table", where I can see the software version for all devices installed in vehicles (all that did connect and communicate with the server)... something like the table below:
Remark: Device#1 for 414 is missing because it didn't communicate (not yet I guess...)

With the information we have so far, I think you need a query with a PIVOT:
SELECT P.VehicleNo, V.DriverConsoleVersion, P.[1] AS [Device1SwVersion], P.[2] AS [Device1SwVersion], P.[3] AS [Device1SwVersion]
FROM (
SELECT VehicleNo, [1], [2], [3]
FROM (
SELECT VehicleNo, DevicePosition, DeviceSwVersion
FROM #DeviceInfo
) as d
PIVOT (
MAX(DeviceSwVersion)
FOR DevicePosition IN ([1], [2], [3])
) PIV
) P
LEFT JOIN #DeviceInfo V
ON V.VehicleNo = P.VehicleNo AND V.DevicePosition = 1;
You can create a view with such a query.
The first subquery get 4 column for Device 1 to 3 for each vehicle.
It then LEFT JOIN it with the SwVersion table in order to get the Console version associated with Device 1.
Output:
VehicleNo DriverConsoleVersion Device1SwVersion Device1SwVersion Device1SwVersion
102 1.34 8.01 NULL 8.00
305 1.34 8.01 8.00 8.00
312 1.33 8.00 8.00 8.00
414 NULL NULL 8.01 8.00
Your data:
Declare #DeviceInfo TABLE([DeviceSerial] int, [VehicleNo] int, [DevicePosition] int, [DeviceSwVersion] varchar(10), [DriverConsoleVersion] varchar(10));
INSERT INTO #DeviceInfo([DeviceSerial], [VehicleNo], [DevicePosition], [DeviceSwVersion], [DriverConsoleVersion])
VALUES
(1040, 305, 3, '8.00', '0'),
(1044, 305, 2, '8.00', '0'),
(1063, 305, 1, '8.01', '1.34'),
(1071, 312, 2, '8.00', '0'),
(1075, 312, 1, '8.00', '1.33'),
(1078, 312, 3, '8.00', '0'),
(1099, 414, 3, '8.00', '0'),
(1106, 414, 2, '8.01', '0'),
(1113, 102, 1, '8.01', '1.34'),
(1126, 102, 3, '8.00', '0')
;

I like the PIVOT answer, but here is another way:
select VehicleNo,
max(DriverConsoleVersion) DriverConsoleVersion,
max(case when DevicePosition = 1 then DeviceSwVersion end) Device1SwVersion,
max(case when DevicePosition = 2 then DeviceSwVersion end) Device2SwVersion,
max(case when DevicePosition = 3 then DeviceSwVersion end) Device3SwVersion
from #DeviceInfo
group by VehicleNo
order by VehicleNo
You can also do casting or formatting on them. So one might be:
select ...,
isnull(cast(cast(
max(case when DevicePosition = 1 then DeviceSwVersion end)
as decimal(8,2)) / 100) as varchar(5)), '') Device1SwVersion,

Calculating Formula with CTE tree

Data
I have following partial data
id parent multiplier const
-- ------ ---------- -----
1 NULL 1.10 1.00
2 1 1.20 2.00
3 1 1.30 3.00
4 1 2.40 4.00
5 2 2.50 5.00
6 2 2.60 6.00
7 2 2.70 17.00
8 3 2.80 18.00
9 3 3.90 19.00
10 3 3.10 7.00
11 8 3.20 8.00
12 8 3.30 9.00
13 8 3.40 10.00
14 9 4.50 11.00
15 10 4.60 21.00
15 10 4.70 22.00
Which can be displayed in a tree as following
1
+-- 2
| +-- 5
| +-- 6
| +-- 7
|
+-- 3
| +-- 8
| | +-- 11
| | +-- 12
| | +-- 13
| |
| +-- 9
| | +-- 14
| |
| +-- 10
| +-- 15
| +-- 16
|
+-- 4
SQL to create the table structure and data
DECLARE #table TABLE (Id int, Parent int, multiplier decimal(6,3), Const decimal(6,3));
INSERT INTO #table
SELECT 1, NULL, 1.1, 1.00 UNION
SELECT 2, 1, 1.2, 2.00 UNION
SELECT 3, 1, 1.3, 3.00 UNION
SELECT 4, 1, 2.4, 4.00 UNION
SELECT 5, 2, 2.5, 5.00 UNION
SELECT 6, 2, 2.6, 6.00 UNION
SELECT 7, 2, 2.7, 17.00 UNION
SELECT 8, 3, 2.8, 18.00 UNION
SELECT 9, 3, 3.9, 19.00 UNION
SELECT 10, 3, 3.1, 7.00 UNION
SELECT 11, 8, 3.2, 8.00 UNION
SELECT 12, 8, 3.3, 9.00 UNION
SELECT 13, 8, 3.4, 10.00 UNION
SELECT 14, 9, 4.5, 11.00 UNION
SELECT 15, 10, 4.6, 21.00 UNION
SELECT 15, 10, 4.7, 22.00;
Problem
I need to calculate recursive aX+b formula up to the root for any node in the tree. In other words I need to calculate the formula for child node and move resulting value up to parent as x and continue calculation until I reach root.
For example calculating x=1250.00 for node 14 will be
1.10 * (1.30 *( 3.90 * (4.50 * 1250.00 + 11.00) + 19.00) + 3.00) + 1.00 = 31463.442
Currently I am doing this using CTE tree and a C# however I am not satisfied with its speed and optimization.
Question
Can I do this calculation on SQL server and just return the value? If it is possible, what is the tree depth that I can navigate with CTE?

Can I do this calculation on SQL server and just return the value?
Yes, do the recursion from the leaf node, do the calculation as you go and get the max value in the main query.
with C as
(
select T.Id,
T.Parent,
cast(T.multiplier * #x + T.Const as decimal(19, 3)) as x
from #table as T
where T.Id = 14
union all
select T.Id,
T.Parent,
cast(T.multiplier * C.x + T.Const as decimal(19, 3))
from C
inner join #table as T
on C.Parent = T.Id
)
select max(C.x) as Value
from C
option (maxrecursion 0);
If it is possible, what is the tree depth that I can navigate with
CTE?
Default is 100 but you can change that with maxrecursion. When using option (maxrecursion 0) there is no limit.
I am not satisfied with its speed and optimization.
To fix that you have to show what you actually do. The sample you have provided gets a good plan if you have a clustered primary key on Id.
It does a seek to find the anchor and seeks for each iteration.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Query Comparison Processing Efficiency, Any Better Solution? - sql

The 1st query should be far more efficient. If you really want to speed things up, you'll want to make sure you have a covering index with EmitterIban, ReceiverIban as the key.

Related

How can select only values that similar in several columns but different in several others?

how to select a value based on multiple criteria

Running Sum between dates on group by clause

How to create a query on an existing table and build a table(view) with aggregated data and a restriction?

Calculating Formula with CTE tree

Categories

Resources