PL/SQL Group By question adding extra columns dependent on row numbers - sql

I'm struggling with a group by. I have a query which pulls two rows of data for some stock that has been counted. The rows it returns are like this.
However, I need this to display on one row like below.
This example only has two counts taking place but other examples could have up to 4 rows so would potentially need a Count 3 and Count 4 column. The count difference needs to be the last count quantity - the first rows original quantity. There is a dstamp field which can be used to identify when each count happened.
My current SQL I'm using to pull this data is below
Select bin, sku, original_qty, (original_qty + count_qty) countQty, count_difference, quantity, counter
FROM stock_counts
order by bin, dstamp DESC

You are not even returning dstamp in the results. But if you want to pivot, you can use conditional aggregation. It is not really clear what all the columns mean. But you can readily pivot the quantities by time using:
select bin, sku,
max(case when seqnum = 1 then countQty end) as original_qty,
max(case when seqnum = 2 then countQty end) as qty1,
max(case when seqnum = 3 then countQty end) as qty2,
max(case when seqnum = 4 then countQty end) as qty3
from (select sc.*,
row_number() over (partition by sku, bin order by dstamp) as seqnum
from stock_counts sc
) sc
group by sku, bin;
Of course, you need to have enough columns to cover the number of quantities you are concerned about.

Related

Transposing a column of data in SQL

In SQL my data output is
Agreement ID
ProductStatus
125
A
125
C
125
N
I want to see this instead as
Agreement ID
ProductStatus
125
A,C, N
OR
Agreement ID
ProductStatus1
ProductStatus2
ProductStatus3
125
A
C
N
I've tried a few simple pivots, but the values a, c & n CAN be different and random values each time.
Can anyone help?
You can use a group_concat function. Your query will be something like this
SELECT agreement_id, group_concat(Product_Status)
FrOM mytable
group by agreement_id
This is for MySQL, for other databases you can search for group_concat alternative function for that particular database.
Seems like you are new to database. You can use this reference to learn more.
https://www.mysqltutorial.org/basic-mysql-tutorial.aspx
If you can get three values in different columns using conditional aggregation:
select agreementid,
max(case when seqnum = 1 then ProductStatus end) as ProductStatus_1,
max(case when seqnum = 2 then ProductStatus end) as ProductStatus_2,
max(case when seqnum = 3 then ProductStatus end) as ProductStatus_3
from (select t.*,
row_number() over (partition by agreementid order by agreementid) as seqnum
from t
) t
group by agreementid;
The SQL standard for creating a list is:
select agreementid,
list_agg(ProductStatus, ',') within group (order by productstatus) as productstatuses
from t
group by agreementid;
Many databases have different names for this function.
In both of these cases, the ordering of the columns or elements of the list are indeterminate. SQL table represent unordered sets (well, technically multisets) so there is no ordering unless a column specifies the ordering.

Can SQL Compare rows in same table , and dynamic select value?

Recently, i got a table which name Appointments
The requirement is that i need to select only one row for each customer by 2 rule:
if same time and (same location or different location), put null on tutor and location.
if different time and (same location or different location), pick the smallest row.
Since i'm so amateur in SQL, i've search the method of self join, but it seems not working in this case.
Expected result
Thanks all, have a great day...
You seem to want the minimum time for each customer, with null values if there are multiple rows and the tutor or location don't match.
You can use window functions:
select customer, starttime,
(case when min(location) = max(location) then min(location) end) as location,
(case when min(tutor) = max(tutor) then min(tutor) end) as tutor
from (select t.*, rank() over (partition by customer order by starttime) as seqnum
from t
) t
where seqnum = 1
group by customer, starttime

Adding values with condition in google bigquery

I need to add some values with a condition in GoogleBigQuery
NOTICE: I edited the original question since it was not accurate enough.
Thanks to the two participants who have tried to help me.
I tried to apply the solutions kindly suggested by you but I got the same result from the pct column as a result.
Something like this:
results
Here is the more detailed definition:
TABLE
Columns:
Shop: Shop location
brand: Brands of cars sold at shoplocation
sales: sales of each brand sold at each shop_location
rank: Rank of each brand per shop location (the biggest the greater)
total_sales_shop: SUM of all brand sales per shop location
pct: percentage of sales by brand in relationship with shop location
pct_acc:
What i need to calc is pct_acc which is the cumulative sum of the shops by rank (while it has no relation with brand)
PCT_ACC
My need is to reach something like PCT_ACC, and then save the results in another one like this:endtable
You can use following query to get the required data:
select values, rank,
sum(case when rank<2 then values else 0 end) condition1
from table
group by values, rank
Need to add/remove columns from select and group by as per requirement
To get the cumulative sum you can use following query:
select shop, brand, sales, rank, total_sales_shop, pct ,
sum(pct) over (partition by shop order by rank) as pct_act
from data
And to get the final table you can use combination of case statement and group by
e.g
select shop,
max(case when rank=1 then pct_act end) as rank_1,
max(case when rank=2 then pct_act end) as rank_2,
max(case when rank=3 then pct_act end) as rank_3,
max(case when rank=4 then pct_act end) as rank_4,
max(case when rank=5 then pct_act end) as rank_5
from cumulative_sum
group by shop
If you want only the final sum for the rows with that condition you can try:
SELECT
SUM (IF(Rank < 2, Values, 0) AS condition1
FROM table
If you want to get the rank and the sum only for the rows with that condition you can try doing
SELECT
Rank,
SUM (IF(Rank < 2, Values, 0) AS condition1
FROM table
WHERE
RANK < 2
GROUP BY
Rank
Finally, if you want to get the rank and the sum considering all the rows you can try doing:
SELECT
Rank,
SUM (IF(Rank < 2, Values, 0) AS condition1
FROM table
GROUP BY
Rank
I hope it helps

How to do a Postgresql group aggregation: 2 fields using one to select the other

I have a table - Data - of rows, simplified, like so:
Name,Amount,Last,Date
A,16,31,1-Jan-2014
A,27,38,1-Feb-2014
A,12,34,1-Mar-2014
B,8,37,1-Jan-2014
B,3,38,1-Feb-2014
B,17,39,1-Mar-2014
I wish to group them similar to:
select Name,sum(Amount),aggr(Last),max(Date) from Data group by Name
For aggr(Last) I want the value of 'Last' from the row that contains max(Date)
So the result I want would be 2 rows
Name,Amount,Last,Date
A,55,34,1-Mar-2014
B,28,39,1-Mar-2014
i.e. in both cases, the value of Last is the one from the row that contained 1-Mar-2014
The query I'm actually doing is basically the same, but with many more sum() fields and millions of rows, so I'm guessing an aggregate function could avoid multiple extra requests each group of incoming rows.
Instead, use row_number() and conditional aggregation:
select Name, sum(Amount),
max(case when seqnum = 1 then Last end) as Last,
max(date)
from (select d.*, row_number() over (partition by name order by date desc) as seqnum
from data d
) d
group by Name;

Parallelizable OVER EACH BY

I am hitting this obstacle again and again...
JOIN EACH and GROUP EACH BY clauses can't be used on the output of window functions
Is there a best practice or recommendations how to use window functions (Over()) with very large data sets that cannot be processed on a single node?
Fragmenting my data and running the same query with different filters can work, but its very limiting, takes lot of time (and manual labor) and costly (running same query on the same data set 30 times instead of once).
Referring to Jeremy's answer bellow...
It's better, but still doesn't work properly.
If I take my original query sample:
select title,count (case when contributor_id<>LeadContributor then 1 else null end) as different,
count (case when contributor_id=LeadContributor then 1 else null end) as same,
count(*) as total
from
(
SELECT title,contributor_id,lead(contributor_id)over(partition by title order by timestamp) as LeadContributor
FROM [publicdata:samples.wikipedia]
where regexp_match(title,r'^[A,B]')=true
)
group by title
Now works...
But
select title,count (case when contributor_id<>LeadContributor then 1 else null end) as different,
count (case when contributor_id=LeadContributor then 1 else null end) as same,
count(*) as total
from
(
SELECT title,contributor_id,lead(contributor_id)over(partition by title order by timestamp) as LeadContributor
FROM [publicdata:samples.wikipedia]
where regexp_match(title,r'^[A-Z]')=true
)
group each by title
Gives again the Resources Exceeded Error...
Window functions can now be executed in distributed fashion according to the PARTITION BY clause given inside OVER. If you supply a PARTITION BY with your window functions, your data will be processed in parallel similar to how JOIN EACH and GROUP EACH BY are processed.
In addition, you can use PARTITION BY on the output of JOIN EACH or GROUP EACH BY without serializing execution. Using the same keys for PARTITION BY as for JOIN EACH or GROUP EACH BY is particularly efficient, because the data will not need to be reshuffled between join/aggregation and window function execution.
Update: note Jeremy's comment with good news.
OVER() functions always need to run on the whole dataset as the last step of execution (they even run after the LIMIT clauses). Everything needs to fit in the last VM, unless it's parallelizable with a PARTITION clause.
When I find this type of errors, I try to filter as much data as I can in earlier steps.
For example, this query doesn't run:
SELECT Year, Actor1Name, Actor2Name, c FROM (
SELECT Actor1Name, Actor2Name, Year, COUNT(*) c, RANK() OVER(PARTITION BY YEAR ORDER BY c DESC) rank
FROM
(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name),
(SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name),
WHERE Actor1Name IS NOT null
AND Actor2Name IS NOT null
GROUP EACH BY 1, 2, 3
)
WHERE rank=1
ORDER BY Year
But I can fix it easily with an earlier filter, in this case adding a "HAVING c > 100":
SELECT Year, Actor1Name, Actor2Name, c FROM (
SELECT Actor1Name, Actor2Name, Year, COUNT(*) c, RANK() OVER(PARTITION BY YEAR ORDER BY c DESC) rank
FROM
(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name),
(SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name),
WHERE Actor1Name IS NOT null
AND Actor2Name IS NOT null
GROUP EACH BY 1, 2, 3
HAVING c > 100
)
WHERE rank=1
ORDER BY Year
So what is happening here: Before applying RANK() OVER(), I'm getting rid of many of the combinations that won't matter when I'm looking for the top ones (as I'm filtering out everything with a count less than 100).
To give a more specific answer, it's always better if you can supply a query and sample data to review.