I have a problem where I have a large count of values on one side(a) and need to sum themup to a single value on the other(x) . There is no logical grouping to get to the total value(x)
On side (a) there are 10000+ items that need to be summed to a single value on the (z) side. Not all of the values on side (a) are needed to sum up to (z)
(a) (z)
123. 2
321. 19
234. 100
122
1
23
1
19
77
Expected output:
(a) 1, 1. = (z) 2
(a) 19. = (z) 19
(a) 23, 77. = (z) 100
Sum(a) to equal a value in (z)
My current code groups on date but now that will not work as I do not have a predefined date range.
Current code:
Select * From
(
Select sum(amount), date
From (a)
Group by date
) a
Inner join
(
Select amount,date
From (z)
) b on a.date = b.date
Where a.Amount - b.Amount = 0
This sounds like a self-join:
select z.z, a1.amount, a2.amount
from z z left join
(a a1 left join
a a2
on a1.amount < a2.amount
)
on z.amount = a1.amount + coalesce(a2.amount, 0);
Related
I have the one column in the table of AWS Athena with structure as follows.
Order_id Item_Count Path_List order_date
1 1 A, B, C, A, W 2022-08-23
2 3 C, A, D, Z 2022-08-21
Path -> type array of Strings.
The first row indicates that order_id 1 had 1 item which was purchased after the item passed path A->B->C->A->W.
Similarly, 2nd row indicates that order_id 2 had 3 items which were purchased after the items passed path C->A->D->Z
Now I need to write a SQL query which gives me the list of all the paths and their total contribution in the orders for a given time-range.
So to print distinct path items, I have written the below query.
select path_item from table_name cross join unnest(path_list) as t(path_item)
where date(order_date) <= current_date and date(order_date) >= current_date - interval '6' day group by 1
So I get all the individual path stages for all the path_list. The output is as follows:
A
B
C
W
D
Z
Now I want to find how much stage A or C or D contributed to the overall products purchased.
To find contribution of A, it should be
(Product which followed A)/(Total path items)
= 5/17
total path items = 15 + 34 = 17
Items having A path = 2 + 3*1 = 5
Similarly, for B and W, it will be
1/17
For C, 4/17
for D, 3/17
For Z, 3/17
Can you suggest the SQL to print the below output
A 5/17
B 1/17
C 4/17
W 1/17
D 3/17
Z 3/17
You can use group by to process the unnested data (notice that I use succinct style allowing to skip cross join for unnest) and use either windows functions or subquery to count total number of elements for the divisor. With window function:
-- sample data
with dataset (Order_id, Item_Count, Path_List, order_date) as (
values (1, 1, array['A', 'B', 'C', 'A', 'W'], '2022-08-23'),
(2, 3, array['C', 'A', 'D', 'Z' ], '2022-08-21')
)
-- query
select edge,
cast(edge_cnt as varchar)
|| '/'
|| cast(sum(edge_cnt) over (range between unbounded preceding and UNBOUNDED FOLLOWING) as varchar)
from (select sum(Item_Count) edge_cnt, edge
from dataset,
unnest(Path_List) as t(edge)
group by edge)
order by edge;
With subquery:
-- query
select edge,
cast(edge_cnt as varchar)
|| '/'
|| cast((select sum(Item_Count * cardinality(Path_List)) from dataset) as varchar)
from (select sum(Item_Count) edge_cnt, edge
from dataset,
unnest(Path_List) as t(edge)
group by edge)
order by edge;
Output:
edge
_col1
A
5/17
B
1/17
C
4/17
D
3/17
W
1/17
Z
3/17
I have a table, say Table1:
And, I am trying to extract data with the following conditions:
select all entries in column A which are 2 and 5,
All entries in column B which are 100
All data which have the contract ID 15 in column C
All dates, in Column D which are less than 31.02.2016, for example
Finally, the row(s), which has (have) the maximum value in Column G
If I use the following code (except finding the maximum date in column G), it works fine:
Select * from Table1
where
A in (2 , 5)
and B = 100
and C = '15'
and D <= TO_DATE ('31.01.16', 'DD.MM.YY HH24:MI:SS')
and gives me the following result:
Now, I want to find all those rows, which have the maximum date value in column G. If I use the following to find the row in this case corresponding to maximum date in G, the query runs and I get an empty table with just the column names:
Select * from Table1 t1
where
A in (2 , 5)
and B = 100
and C = '15'
and D <= TO_DATE ('31.01.16', 'DD.MM.YY HH24:MI:SS')
and G = (select MAX(G) from Table1 where G = t1.G)
The desired output is:
What am I doing wrong?
You can use ORDER BY and FETCH:
select *
from Table1
where A in (2 ,5) and
B = 100 and
C = '15' and
D <= date '2016-01-31'
order by g desc
fetch first 1 row only;
Note that I also simplified the syntax for the date constant.
If you want all rows in the event of ties, then use:
fetch first 1 row with ties;
If you just want one row, you can order by and limit:
Select *
from Table1
where
A in (2 , 5)
and B = 100
and C = 15
and D <= date '2016-01-31'
order by d desc
fetch first 1 row only
If you want to allow top ties, then you can use fetch first 1 row with ties instead.
Notes
I used a literal date rather than to_date(): this is simpler to write and more efficient (note that your original format specification was wrong, as the string has no time portion)
it looks like column C is numeric, so I removed the single quotes around the literal value in the condition (you can change it back if the column is of a string datatype)
If you need to get to find all those rows, which have the maximum date value in column Get, then you can use window function dense_rank(). Rows with the same values for the rank criteria will receive the same rank values:
--get all rows with num=1
Select * from
(
Select *, dense_rank() over (order by G desc) num
where
A in (2 , 5)
and B = 100
and C = '15'
and D <= TO_DATE ('31.01.16', 'DD.MM.YY HH24:MI:SS')
) X
Where num=1
I attached a capture of two tables:
- the left table is a result of others "Select" query
- the right table is the result I want from the left table
The right table can be created following the next conditions:
When the same Unit have all positive or all negative
energy values, the result remain the same
When the same Unit have positive and negative energy values then:
Make a sum of all Energy for that Unit(-50+15+20 = -15) and then take the maximum of absolut value for the Energy.e.g. max(abs(energy))=50 and take the price for that value.
I use SQL ORACLE.
I realy appreciate the help in this matter !
http://sqlfiddle.com/#!4/eb85a/12
This returns desired result:
signs CTE finds out whether there are positive/negative values, as well as maximum ABS energy value
then, there's union of two selects: one that returns "original" rows (if count of distinct signs is 1), and one that returns "calculated" values, as you described
SQL> with
2 signs as
3 (select unit,
4 count(distinct sign(energy)) cnt,
5 max(abs(energy)) max_abs_ene
6 from tab
7 group by unit
8 )
9 select t.unit, t.price, t.energy
10 from tab t join signs s on t.unit = s.unit
11 where s.cnt = 1
12 union all
13 select t.unit, t2.price, sum(t.energy)
14 from tab t join signs s on t.unit = s.unit
15 join tab t2 on t2.unit = s.unit and abs(t2.energy) = s.max_abs_ene
16 where s.cnt = 2
17 group by t.unit, t2.price
18 order by unit;
UNIT PRICE ENERGY
-------------------- ---------- ----------
A 20 -50
A 50 -80
B 13 -15
SQL>
Though, what do you expect if there was yet another "B" unit row with energy = +50? Then two rows would have the same MAX(ABS(ENERGY)) value.
A union all might be the simplest solution:
with t as (
select t.*,
max(energy) over (partition by unit) as max_energy,
min(energy) over (partition by unit) as min_energy
from t
)
select unit, price, energy
from t
where max_energy > 0 and min_energy > 0 or
max_energy < 0 and min_enery < 0
union all
select unit,
max(price) keep (dense_rank first order by abs(energy)),
sum(energy)
from t
where max_energy > 0 and min_energy < 0
group by unit;
I need to find the closet value of each number in column Divide from the column Quantity and put the value found in the Value column for both Quantities.
Example:
In the column Divide the value of 5166 would be closest to Quantity column value 5000. To keep from using those two values more than once I need to place the value of 5000 in the value column for both numbers, like the example below. Also, is it possible to do this without a loop?
Quantity Divide Rank Value
15500 5166 5 5000
1250 416 5 0
5000 1666 5 5000
12500 4166 4 0
164250 54750 3 0
5250 1750 3 0
6250 2083 3 0
12250 4083 3 0
1750 583 2 0
17000 5666 2 0
2500 833 2 0
11500 3833 2 0
1250 416 1 0
There are a couple of answers here but they both use ctes/complex subqueries. There is a much simpler/faster way by just doing a couple of self joins and a group-by
https://www.db-fiddle.com/f/rM268EYMWuK7yQT3gwSbGE/0
select
min(min.quantity) as minQuantityOverDivide
, t1.divide
, max(max.quantity) as maxQuantityUnderDivide
, case
when
(abs(t1.divide - coalesce(min(min.quantity),0))
<
abs(t1.divide - coalesce(max(max.quantity),0)))
then max(max.quantity)
else min(min.quantity) end as cloestQuantity
from t1
left join (select quantity from t1) min on min.quantity >= t1.divide
left join (select quantity from t1) max on max.quantity < t1.divide
group by
t1.divide
If I understood the requirements, 5166 is not closest to 5000 - it's closes to 5250 (delta of 166 vs 84)
The corresponding query, without loops, shall be (fiddle here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=be434e67ba73addba119894a98657f17).
(I added a Value_Rank as it's not sure if you want Rank to be kept or recomputed)
select
Quantity, Divide, Rank, Value,
dense_rank() over(order by Value) as Value_Rank
from
(
select
Quantity, Divide, Rank,
--
case
when abs(Quantity_let_delta) < abs(Quantity_get_delta) then Divide + Quantity_let_delta
else Divide + Quantity_get_delta
end as Value
from
(
select
so.Quantity, so.Divide, so.Rank,
-- There is no LessEqualThan, assume GreaterEqualThan
max(isnull(so_let.Quantity, so_get.Quantity)) - so.Divide as Quantity_let_delta,
-- There is no GreaterEqualThan, assume LessEqualThan
min(isnull(so_get.Quantity, so_let.Quantity)) - so.Divide as Quantity_get_delta
from
SO so
left outer join SO so_let
on so_let.Quantity <= so.Divide
--
left outer join SO so_get
on so_get.Quantity >= so.Divide
group by so.Quantity, so.Divide, so.Rank
) so
) result
Or, if by closest you mean the previous closest (fiddle here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=b41fb1a3fc11039c7f82926f8816e270).
select
Quantity, Divide, Rank, Value,
dense_rank() over(order by Value) as Value_Rank
from
(
select
so.Quantity, so.Divide, so.Rank,
-- There is no LessEqualThan, assume 0
max(isnull(so_let.Quantity, 0)) as Value
from
SO so
left outer join SO so_let
on so_let.Quantity <= so.Divide
group by so.Quantity, so.Divide, so.Rank
) result
You don't need a loop, basically you need to find which is lowest difference between the divide and all the quantities (first cte). Then use this distance to find the corresponding record (second cte) and then join with your initial table to get the converted values (final select)
;with cte as (
select t.Divide, min(abs(t2.Quantity-t.Divide)) as ClosestQuantity
from #t1 as t
cross apply #t1 as t2
group by t.Divide
)
,cte2 as (
select distinct
t.Divide, t2.Quantity
from #t1 as t
cross apply #t1 as t2
where abs(t2.Quantity-t.Divide) = (select ClosestQuantity from cte as c where c.Divide = t.Divide)
)
select t.Quantity, cte2.Quantity as Divide, t.Rank, t.Value
from #t1 as t
left outer join cte2 on t.Divide = cte2.Divide
Fair warning: I'm new to using SQL. I do so on an Oracle server either via AQT or with SQL Developer.
As I haven't been able to think or search my way to an answer, I put myself in your able hands...
I'd like to combine data from table A (high quality data) with data from table B (fresh data) such that the entries from B are only included when the date stamp are later than those available from table A.
Both tables include entries from multiple entities, and the latest date stamp varies with those entities.
On the 4th of january, the tables may look something like:
A____________________________ B_____________________________
entity date type value entity date type value
X 1.jan 1 1 X 1.jan 1 2
X 1.jan 0 1 X 1.jan 0 2
X 2.jan 1 1 X 2.jan 1 2
Y 1.jan 1 1 (new entry)X 3.jan 1 1
Y 3.jan 1 1 Y 1.jan 1 2
Y 3.jan 1 2
(new entry)Y 4.jan 1 1
I have made an attempt at some code that I hope clarify my need:
WITH
AA AS
(SELECT entity, date, SUM(value)
FROM table_A
GROUP BY
entity,
date),
BB AS
(SELECT entity, date, SUM(value)
FROM table_B
WHERE date > ALL (SELECT date FROM AA)
GROUP BY
entity,
date
)
SELECT * FROM (SELECT * FROM AA UNION ALL SELECT * FROM BB)
Now, if the WHERE date > ALL (SELECT date FROM AA)would work seperately for each entity, I think have what I need.
That is, for each entity I want all entries from A, and only newer entries from B.
As the data in table A often differ from that of B (values are often corrected) I dont think I can use something like: table A UNION ALL (table B MINUS table A)?
Thanks
Essentially you are looking for entries in BB which do not exist in AA. When you are doing date > ALL (SELECT date FROM AA) this will not take into consideration the entity in question and you will not get the correct records.
Alternative is to use the JOIN and filter out all matching entries with AA.
Something like below.
WITH
AA AS
(SELECT entity, date, SUM(value)
FROM table_A
GROUP BY
entity,
date),
BB AS
(SELECT entity, date, SUM(value)
FROM table_B
LEFT OUTER JOIN AA
ON AA.entity = BB.entity
AND AA.DATE = BB.date
WHERE AA.date == null
GROUP BY
entity,
date
)
SELECT * FROM (SELECT * FROM AA UNION ALL SELECT * FROM BB)
I find your question confusing, because I don't know where the aggregation is coming from.
The basic idea on getting newer rows from table_b uses conditions in the where clause, something like this:
select . . .
from table_a a
union all
select . . .
from table_b b
where b.date > (select max(a.date) from a where a.entity = b.entity);
You can, of course, run this on your CTEs, if those are what you really want to combine.
Use UNION instead of UNION ALL , it will remove the duplicate records
SELECT * FROM (
SELECT *
FROM AA
UNION
SELECT *
FROM BB )