How to calculate high and low entries per year?

How to calculate high and low entries per year? - apache-spark-sql

I have a sample table like below. I need to calculate 1 Year High/Low on below data.
| id | date | amt |
+-------+---------------------+----------+
| 1|2016-03-01 00:00:00.0| 25.7262|
| 1|2016-03-02 00:00:00.0| 26.6861|
| 1|2016-03-03 00:00:00.0| 27.0688|
| 1|2016-03-04 00:00:00.0| 28.8077|
| 1|2016-03-07 00:00:00.0| 29.6904|
| 1|2016-03-08 00:00:00.0| 26.9298|
| 1|2016-03-09 00:00:00.0| 27.2492|
| 1|2016-03-10 00:00:00.0| 26.278|
I think I have to do something like below , but Problem with this code is a year can be a leap year too.
def days(i : Int) : Long= i * 86400
val aggreate = Window
.partitionBy("id")
.orderBy(unix_timestamp($"date"))
.rangeBetween(-days(365),0)
df.select(df("id"),df("date"),df("amt"))
.withColumn("wk52_high",max("amt") over aggreate)
.withColumn("wk52_low",min("amt") over aggreate )
+------------+--------------------+----------+-------------+------------+
| id | date | amt | wk52_high | wk52_low |
+------------+--------------------+----------+-------------+------------+
| 1|2016-03-01 00:00:00.0| 25.7262| 25.7262| 25.7262|
| 1|2016-03-02 00:00:00.0| 26.6861| 26.6861| 25.7262|
| 1|2016-03-03 00:00:00.0| 27.0688| 27.0688| 25.7262|
| 1|2016-03-04 00:00:00.0| 28.8077| 28.8077| 25.7262|
| 1|2016-03-07 00:00:00.0| 29.6904| 29.6904| 25.7262|
| 1|2016-03-08 00:00:00.0| 26.9298| 29.6904| 25.7262|
| 1|2016-03-09 00:00:00.0| 27.2492| 29.6904| 25.7262|
| 1|2016-03-10 00:00:00.0| 26.278| 29.6904| 25.7262|
how can i handle the leap year case?

What I'd suggest is to use year function to partition by (except portfolio_id) and rank function to calculate min (order ascending) and max (order descending) over the window spec.
val byYearOrderByAmt = Window.partitionBy(year($"date")).orderBy("amt")
scala> inventory.withColumn("rank", rank() over byYearOrderByAmt).show
+---+-------------------+-------+----+
| id| date| amt|rank|
+---+-------------------+-------+----+
| 1|2016-03-01 00:00:00|25.7262| 1|
| 2|2016-03-02 00:00:00|26.6861| 2|
+---+-------------------+-------+----+
scala> inventory.withColumn("rank", rank() over byYearOrderByAmt).where($"rank" === 1).show
+---+-------------------+-------+----+
| id| date| amt|rank|
+---+-------------------+-------+----+
| 1|2016-03-01 00:00:00|25.7262| 1|
+---+-------------------+-------+----+

Related

How to split these multiple rows in SQL?

I am currently studying SQL and I am still a newbie. I have this task where I need to split some rows with various entries like dates and user IDs. I really need help
+-------+------------------------------+---------------------------+
| TYPE | DATES | USER _ID |
+-------+------------------------------+---------------------------+
| WORK | ["2022-06-02", "2022-06-03"] | {74042,88357,83902,88348} |
| LEAVE | ["2022-05-16", "2022-05-26"] | {83902,74042,88357,88348} |
+-------+------------------------------+---------------------------+
the end result should look like this. the user id's should be aligned or should be in the same as their respective dates.
+-------+------------+---------+
| TYPE | DATES | USER_ID |
+-------+------------+---------+
| LEAVE | 05/16/2022 | 74042 |
| LEAVE | 05/16/2022 | 88357 |
| LEAVE | 05/16/2022 | 88348 |
| LEAVE | 05/16/2022 | 83902 |
| LEAVE | 05/26/2022 | 74042 |
| LEAVE | 05/26/2022 | 88357 |
| LEAVE | 05/26/2022 | 88348 |
| LEAVE | 05/26/2022 | 83902 |
| WORK | 06/2/2022 | 74042 |
| WORK | 06/2/2022 | 88357 |
| WORK | 06/2/2022 | 88348 |
| WORK | 06/2/2022 | 83902 |
| WORK | 06/3/2022 | 74042 |
| WORK | 06/3/2022 | 88357 |
| WORK | 06/3/2022 | 88348 |
| WORK | 06/3/2022 | 83902 |
+-------+------------+---------+

Create table:
CREATE TABLE work_leave (
TYPE varchar,
DATES date,
USER_ID integer
);
INSERT INTO work_leave
VALUES ('LEAVE', '05/16/2022', 74042),
('LEAVE', '05/16/2022', 88357),
('LEAVE', '05/16/2022', 88348),
('LEAVE', '05/16/2022', 83902),
('LEAVE', '05/26/2022', 74042),
('LEAVE', '05/26/2022', 88357),
('LEAVE', '05/26/2022', 88348),
('LEAVE', '05/26/2022', 83902),
('WORK', '06/2/2022', 74042),
('WORK', '06/2/2022', 88357),
('WORK', '06/2/2022', 88348),
('WORK', '06/2/2022', 83902),
('WORK', '06/3/2022', 74042),
('WORK', '06/3/2022', 88357),
('WORK', '06/3/2022', 88348),
('WORK', '06/3/2022', 83902);
WITH date_ends AS (
SELECT
type,
ARRAY[min(dates),
max(dates)] AS dates
FROM
work_leave
GROUP BY
type
),
users AS (
SELECT
type,
array_agg(DISTINCT (user_id)
ORDER BY user_id) AS user_ids
FROM
work_leave
GROUP BY
type
)
SELECT
de.type,
de.dates,
u.user_ids
FROM
date_ends AS de
JOIN
users as u
ON de.type = u.type;
type | dates | user_ids
-------+-------------------------+---------------------------
LEAVE | {05/16/2022,05/26/2022} | {74042,83902,88348,88357}
WORK | {06/02/2022,06/03/2022} | {74042,83902,88348,88357}

I adjusted the data slightly for simplicity. Here's one idea:
WITH rows (type, dates, user_id) AS (
VALUES ('WORK', array['2022-06-02', '2022-06-03'], array[74042,88357,83902,88348])
, ('LEAVE', array['2022-05-16', '2022-05-26'], array[83902,74042,88357,88348])
)
SELECT r1.type, x.*
FROM rows AS r1
CROSS JOIN LATERAL (
SELECT r2.dates, r3.user_id
FROM unnest(r1.dates) AS r2(dates)
, unnest(r1.user_id) AS r3(user_id)
) AS x
;
The fiddle
The result:
type
dates
user_id
WORK
2022-06-02
74042
WORK
2022-06-02
88357
WORK
2022-06-02
83902
WORK
2022-06-02
88348
WORK
2022-06-03
74042
WORK
2022-06-03
88357
WORK
2022-06-03
83902
WORK
2022-06-03
88348
LEAVE
2022-05-16
83902
LEAVE
2022-05-16
74042
LEAVE
2022-05-16
88357
LEAVE
2022-05-16
88348
LEAVE
2022-05-26
83902
LEAVE
2022-05-26
74042
LEAVE
2022-05-26
88357
LEAVE
2022-05-26
88348

Convert a string of ids into a string of equivalent names

I have this table (mock data) :
ID
Name
Location
1
Main
/
2
Photos
/1/3
3
Media
/1
4
Charts
/
5
Expenses
/4
The column Location is a string with ids that refer to that very table.
I'm looking for a query to convert ids into names, something like this :
ID
Name
Location
FullName
1
Main
/
/
2
Photos
/1/3
/Main/Media
3
Media
/1
/Main
4
Charts
/
/
5
Expenses
/4
/Charts
This is some mock data, in my real table I have more complex locations.
I'm not the owner of the table so I can't modify the schema. I can only read it.
Someone has an idea ?
Thank you very much
I've been exploring with this function : regexp_split_to_table
WITH flat_data AS (
SELECT DISTINCT
col.id col_id,
col.name col_name,
col.location col_full_loc,
regexp_split_to_table(col.location, '/') as loc_item
FROM collection col),
clean_data AS (
SELECT
col_id,
col_name,
col_full_loc,
CASE WHEN loc_item = '' THEN null ELSE loc_item::integer END loc_item,
ROW_NUMBER() over (partition by col_id, loc_item)
FROM flat_data
) select * from clean_data
So I've managed to have something like this :
| ID | Name | Location | AfterFunction |
| -- | -- | -- | -- |
| 1 | Main | / | |
| 2 | Photos | /1/3 | |
| 2 | Photos | /1/3 | 3 |
| 2 | Photos | /1/3 | |
| 2 | Photos | /1/3 | 1 |
| 3 | Media | /1 | |
| 3 | Media | /1 | 1 |
| 4 | Charts | / | |
| 5 | Expenses | /4 | |
| 5 | Expenses | /4 | 4 |
But at some point I lose the order of sublocation item
EDIT : table style

Outlook to the solution
ignore the first slash in the location to simplify the split and mapping (add it again at the end)
use regexp_split_to_table along with WITH ORDINALITY to preserve the order
outer join the location part to the original table (cast the idto textis it is int)
string_agg the location names to one string using the ordinality column and add the fixed slash prefix.
Query
with t2 as (
select * from t,
regexp_split_to_table(substr(t.location,2), '/') WITH ORDINALITY x(part, rn)
),
t3 as (
select t2.*, t.name part_name from t2
left outer join t on t2.part = t.id::text)
select
t3.id, t3.name, t3.location,
'/'||coalesce(string_agg(t3.part_name,'/' order by t3.rn),'') loc_name
from t3
group by 1,2,3
order by 1
gives result
id|name |location|loc_name |
--+--------+--------+-----------+
1|Main |/ |/ |
2|Photos |/1/3 |/Main/Media|
3|Media |/1 |/Main |
4|Charts |/ |/ |
5|Expenses|/4 |/Charts |
Below the result of the subqueries to illustrated the steps
-- T2
id|name |location|part|rn|
--+--------+--------+----+--+
1|Main |/ | | 1|
2|Photos |/1/3 |1 | 1|
2|Photos |/1/3 |3 | 2|
3|Media |/1 |1 | 1|
4|Charts |/ | | 1|
5|Expenses|/4 |4 | 1|
-- T3
id|name |location|part|rn|part_name|
--+--------+--------+----+--+---------+
1|Main |/ | | 1|Main |
2|Photos |/1/3 |1 | 1|Photos |
2|Photos |/1/3 |3 | 2|Photos |
3|Media |/1 |1 | 1|Media |
4|Charts |/ | | 1|Charts |
5|Expenses|/4 |4 | 1|Expenses |

SQL Right Join on Non Unique

I'm hoping that im over thinking this. but i need to sum a column where i have no unique link to join on and when i do it double ups columns.
This is my current SQL that works until i add the join on vwBatchInData then it doubles up every record, what is the best way to achieve this?
select b.fldBatchID as 'ID',SUM(bIn.fldBatchDetailsWeight) as 'Batch In', sum(t.fldTransactionNetWeight) as 'Batch Out' , format((sum(t.fldTransactionNetWeight) / sum(bIn.fldBatchDetailsWeight)),'P2' ) as 'Yield'
from [TRANSACTION] t
right join vwBatchInData bIn on bIn.fldBatchID = t.fldBatchID
inner join Batch b on b.fldBatchID = t.fldBatchID
where CAST(b.fldBatchDate as date) = '2020-03-04'
group by b.fldBatchID**
vwBatchInData Table
+------------+---------------+-----------------------+
| fldBatchID | fldKillNumber | fldBatchDetailsWeight |
+------------+---------------+-----------------------+
| 2862 | 601598 | 164.40 |
| 2862 | 601599 | 190.80 |
| 2862 | 601596 | 195.00 |
| 2862 | 601597 | 200.20 |
| 2862 | 601594 | 176.60 |
+------------+---------------+-----------------------+
Transaction Table
+------------+------------------+-------------------------+
| fldBatchID | fldTransactionID | fldTransactionNetWeight |
+------------+------------------+-------------------------+
| 2862 | 10242352 | 16.26 |
| 2862 | 10242353 | 22.82 |
| 2862 | 10242362 | 18.52 |
| 2862 | 10242363 | 21.44 |
| 2862 | 10242364 | 20.32 |
+------------+------------------+-------------------------+
Batch Table
+------------+-------------------------+
| fldBatchID | fldBatchDate |
+------------+-------------------------+
| 2862 | 2020-03-04 00:00:00.000 |
+------------+-------------------------+
Desired output with the above snipets
+------+----------+-----------+---------+
| ID | Batch In | Batch Out | Yield |
+------+----------+-----------+---------+
| 2862 | 927.00 | 90.36 | 10.76 % |
+------+----------+-----------+---------+

I think you just want to aggregate before joining:
select b.fldBatchID as ID,
(bIn.fldBatchDetailsWeight) as batch_in,
(t.fldTransactionNetWeight) as batch_out,
format(t.fldTransactionNetWeight / bIn.fldBatchDetailsWeight, 'P2' ) as Yield
from batch b left join
(select bin.fldBatchID, sum(fldBatchDetailsWeight) as fldBatchDetailsWeight
from vwBatchInData bin
group by bin.fldBatchID
) bin
on bIn.fldBatchID = b.fldBatchID left join
(select t.fldBatchID, sum(fldTransactionNetWeight) as fldTransactionNetWeight
from transactions t
group by t.fldBatchID
) bin
on t.fldBatchID = b.fldBatchID
where CAST(b.fldBatchDate as date) = '2020-03-04';

How can I summarize / pivot data with oracle sql

I have a table containing geological resource information.
| Property | Zone | Area | Category | Tonnage | Au_gt | Au_oz |
|----------|------|-------------|-----------|---------|-------|-------|
| Ket | Eel | Open Pit | Measured | 43400 | 5.52 | 7700 |
| Ket | Eel | Open Pit | Inferred | 51400 | 5.88 | 9700 |
| Ket | Eel | Open Pit | Indicated | 357300 | 6.41 | 73600 |
| Ket | Eel | Underground | Measured | 3300 | 7.16 | 800 |
| Ket | Eel | Underground | Inferred | 14700 | 6.16 | 2900 |
| Ket | Eel | Underground | Indicated | 168100 | 8.85 | 47800 |
I would like to summarize the data so that it can be read more easily by our clients.
| Property | Zone | Category | Open_Pit_Tonnage | Open_Pit_Au_gt | Open_Pit_Au_oz | Underground_tonnage | Underground_au_gt | Underground_au_oz | Combined_tonnage | Combined_au_gt | Combined_au_oz |
|----------|------|-----------|------------------|----------------|----------------|---------------------|-------------------|-------------------|------------------|----------------|----------------|
| Ket | Eel | Measured | 43,400 | 5.52 | 7,700 | 3,300 | 7.16 | 800 | 46,700 | 5.64 | 8,500 |
| Ket | Eel | Indicated | 357,300 | 6.41 | 73,600 | 168,100 | 8.85 | 47,800 | 525,400 | 7.19 | 121,400 |
| Ket | Eel | Inferred | 51,400 | 5.88 | 9,700 | 14,700 | 6.16 | 2,900 | 66,100 | 5.94 | 12,600 |
I'm fairly new to pivot tables. How could I write a query to translate and summarize the data?
Thanks!

If your Oracle version is 11.1 or higher (which it should be if you are a relatively new user!) then you can use the PIVOT operator, as shown below.
Note that the result of the PIVOT operation can be given an alias (I used p) - this makes it easier to write the SELECT clause.
I assumed the name of your table is geological_data - replace it with your actual table name.
select p.*
, open_pit_tonnage + underground_tonnage as combined_tonnage
, open_pit_au_gt + underground_au_gt as combined_au_gt
, open_pit_au_oz + underground_au_oz as combined_au_oz
from geological_data
pivot (sum(tonnage) as tonnage, sum(au_gt) as au_gt, sum(au_oz) as au_oz
for area in ('Open Pit' as open_pit, 'Underground' as underground)) p
;

Conditional aggregation is a simple method:
select Property, Zone, Category,
max(case when area = 'Open Pit' then tonnage end) as open_pit_tonnage,
max(case when area = 'Open Pit' then Au_gt end) as open_pit_Au_gt,
max(case when area = 'Open Pit' then Au_oz end) as open_pit_Au_ox,
max(case when area = 'Underground' then tonnage end) as Underground_tonnage,
max(case when area = 'Underground' then Au_gt end) as Underground_Au_gt,
max(case when area = 'Underground' then Au_oz end) as Underground_Au_ox
from t
group by Property, Zone, Category

SQL Server PIVOT operator is used to convert rows to columns.
Goal is to turn the category names from the first column of the output into multiple columns and count the number of products for each category
This query reference can be taken into account for you above table:
SELECT * FROM
(
SELECT
category_name,
product_id,
model_year
FROM
production.products p
INNER JOIN production.categories c
ON c.category_id = p.category_id
) t
PIVOT(
COUNT(product_id)
FOR category_name IN (
[Children Bicycles],
[Comfort Bicycles],
[Cruisers Bicycles],
[Cyclocross Bicycles],
[Electric Bikes],
[Mountain Bikes],
[Road Bikes])
) AS pivot_table;

Pivot table using flat table structure in SQL Server without aggregation

I have a flat table structure which I've turned into a column based table. I'm struggling with getting the rowId from my raw data to appear in my column based table. Any help greatly appreciated.
Raw data in table derived from three different tables:
| rowId |columnName |ColumnValue |
| ---------------- |:---------------:| -----------:|
| 1 |itemNo |1 |
| 1 |itemName |Polo Shirt |
| 1 |itemDescription |Green |
| 1 |price1 |4.2 |
| 1 |price2 |5.3 |
| 1 |price3 |7.5 |
| 1 |displayOrder |1 |
| 1 |rowId |[NULL] |
| 2 |itemNo |12 |
| 2 |itemName |Digital Watch|
| 2 |itemDescription |Red Watch |
| 2 |price1 |4.0 |
| 2 |price2 |2.0 |
| 2 |price3 |1.5 |
| 2 |displayOrder |3 |
| 2 |rowId |[NULL] |
SQL using pivot to give me the column structure:
select [displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3],[rowId]
from
(
SELECT [columnName], [columnValue] , row_number() over(partition by c.columnName order by cv.rowId) as rn
FROM tblFlatTable AS t
JOIN tblFlatColumns c
ON t.flatTableId = c.flatTableId
JOIN tblFlatColumnValues cv
ON cv.flatColumnId = c.flatColumnId
WHERE (t.flatTableId = 1) AND (t.isActive = 1)
AND (c.isActive = 1) AND (cv.isActive = 1)
) as S
Pivot
(
MIN([columnValue])
FOR columnName IN ([displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3],[rowId])
) as P
Result:
|displayOrder|itemDescription|itemName |price1|price2|price3|rowId |
| ---------- |:-------------:|:------------:|:----:|:----:|:----:|-----:|
|1 |Green |Polo Shirt |4.2 |5.3 |7.5 |[NULL]|
|3 |Red watch |Digital Watch |4.0 |2.0 |1.5 |[NULL]|
I understand why I'm getting the NULL value for rowId. What I'm stuck on and I'm not sure if it's possible to do as I've looked an many example and none seem to do this, that is to pull the value for rowId from the raw data and add it to my structure.

It looks obvious now!
I'm now not including rowId as part of my flat structure.
| rowId |columnName |ColumnValue |
| ---------------- |:---------------:| -----------:|
| 1 |itemNo |1 |
| 1 |itemName |Polo Shirt |
| 1 |itemDescription |Green |
| 1 |price1 |4.2 |
| 1 |price2 |5.3 |
| 1 |price3 |7.5 |
| 1 |displayOrder |1 |
| 2 |itemNo |12 |
| 2 |itemName |Digital Watch|
| 2 |itemDescription |Red Watch |
| 2 |price1 |4.0 |
| 2 |price2 |2.0 |
| 2 |price3 |1.5 |
| 2 |displayOrder |3 |
I've updated the SQL, you can see I'm pulling in the rowId from tblFlatColumnValues
select [rowId],[displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3]
from
(
SELECT cv.rowId, [columnName], [columnValue] , row_number() over(partition by c.columnName order by cv.rowId) as rn
FROM tblFlatTable AS t
JOIN tblFlatColumns c
ON t.flatTableId = c.flatTableId
JOIN tblFlatColumnValues cv
ON cv.flatColumnId = c.flatColumnId
WHERE (t.flatTableId = 1) AND (t.isActive = 1)
AND (c.isActive = 1) AND (cv.isActive = 1)
) as S
Pivot
(
MIN([columnValue])
FOR columnName IN ([displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3])
) as P

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to calculate high and low entries per year? - apache-spark-sql

Related

How to split these multiple rows in SQL?

Convert a string of ids into a string of equivalent names

SQL Right Join on Non Unique

How can I summarize / pivot data with oracle sql

Pivot table using flat table structure in SQL Server without aggregation

Categories

Resources