This question already has answers here:
How to: For each unique id, for each unique version, grab the best score and organize it into a table
(1 answer)
Change direction of table in BigQuery
(2 answers)
Closed 4 months ago.
I have data in below format
CustomerID
ID
Year
value
1000
1477
2022
True
1000
1477
2021
True
1000
1474
2022
Credit
1000
1474
2021
Debit
1000
1464
2022
Total Amount
1000
1464
2021
Net Amount
I would like to transpose this data for a particular Customer ID at each ID level for each year. Below is the expected Output
CustomerID
Year
ID_1477
ID_1474
ID_1464
1000
2022
True
Credit
Total Amount
1000
2021
True
Debit
Net Amount
Below is the query I have written to get this. I basically performed a self join and extracted the required elements into separate columns
SELECT
ele_1477.CustomerID,
ele_1477.Year,
ele_1477.value as ID_1477,
ele_1474.value as ID_1474,
ele_1464.value as ID_1464
FROM
(select * from table where id=1477 ) ele_1477
LEFT OUTER JOIN (select * from table where id=1474 ) ele_1474 ON ele_1477.CustomerID=ele_1474.CustomerID and ele_1477.Year=ele_1474.Year
LEFT OUTER JOIN (select * from table where id=1464 ) ele_1464 ON ele_1477.CustomerID=ele_1464.CustomerID and ele_1477.Year=ele_1464.Year
But my question here is, I am not sure how effective this query is. I have another 150 set of IDs for a CustomerID that needs to be transposed. Does that mean should i do a self join 150 times? Looking for a best possible solution to achieve this
You can use a PIVOT and a dynamic SQL for your purpose.
For your sample data, you can pivot your table like below.
SELECT *
FROM sample_table
PIVOT (ANY_VALUE(Value) ID FOR ID IN ('1477', '1474', '1464'));
I have another 150 set of IDs for a CustomerID that needs to be transposed.
And also you can use EXECUTE IMMEDIATE instead of writing down 150 IDs in your real data.
EXECUTE IMMEDIATE FORMAT("""
SELECT * FROM sample_table PIVOT (ANY_VALUE(Value) ID FOR ID IN ('%s'))
""", (SELECT STRING_AGG(DISTINCT ID, "','") FROM sample_table));
Query results of above two queries
This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?
The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.
Yes, I know this seems simple:
SELECT DISTINCT(...)
Except, it apparently isn't
Here is my actual Query:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS,
IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune,
IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical,
IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther,
IIf([DecReason]=7,1,0) AS YesAlready
FROM
EmployeeInformation
INNER JOIN (CompletedTrainings
LEFT JOIN DeclinationReasons ON CompletedTrainings.DecReason = DeclinationReasons.ReasonID)
ON EmployeeInformation.ID = CompletedTrainings.Employee
GROUP BY
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No"),
IIf([DecReason]=1,1,0),
IIf([DecReason]=2,1,0),
IIf([DecReason]=3,1,0),
IIf([DecReason]=4,1,0),
IIf([DecReason]=5,1,0),
IIf([DecReason]=6,1,0),
IIf([DecReason]=7,1,0)
HAVING
((((EmployeeInformation.Active) Like -1)
AND ((CompletedTrainings.DecShotDate + 365 >= DATE())
OR (CompletedTrainings.DecShotDate IS NULL))));
This is Joining a few tables (obviously) in order to get a number of records. The problem is that if someone is duplicated on the table with a NULL in one of the date fields, and a date in another field, it pulls both the NULL and the DATE, or pulls multiple NULLS it might pull multiple dates but those are not present right at the moment.
I need the Nulls, they are actual data in this particular case, but if someone has a date and a NULL I need to pull only the newest record, I thought I could add MAX(RecordID) from the table, but that didn't change the results of the query either.
That code:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
MAX(CompletedTrainings.RecordID),
CompletedTrainings.DecShotDate
...
And it returned the same issue, Duplicated EmployeeInformation.ID with different DecShotDate values.
Currently it returns:
ID
Active
DecShotDate
etc. x a bunch
1
-1
date date
whatever goes
2
-1
in these
2
-1
date date
columns
These are being used in a report, that is to determine the total number of employees who fit the criteria of the report. The NULLs in DecShotDate are needed as they show people who did not refuse to get a flu vaccine in the current year, while the dates are people who did refuse.
Now I have come up with one simple solution, I could add a column to the CompletedTrainings Table that contains a date or other value, and add that to the HAVING statement. This might be the right solution as this is a yearly training questionnaire that employees have to fill out. But I am asking for advice before doing this.
Am I right in thinking I need to add a column to filter by so that older data isn't being pulled, or should I be able to do this by pulling recordID, and did I just bork that part of the query up?
Edited to add raw table views:
EmployeeInformation Table:
ID
Last
First
empID
Active
Termdate
DoH
Title
PT/FT/PD
PI
1
Doe
Jane
982
-1
date
Sr
PD
X
2
Roe
John
278
0
date
date
Jr
PD
X
3
Moe
Larry
1232
-1
date
Sr
FT
X
4
Zoe
Debbie
1424
-1
date
Sr
PT
X
DeclinationReasons Table:
ReasonID
Reason
1
Allergy
2
Already got it
3
Illness
CompletedTrainings Table:
RecordID
Employee
Training
...
DecShotdate
DecShotLocation
DecShotReason
DecExp
1
1
4
date
location
2
text
2
1
4
3
2
4
4
3
4
date
location
3
text
5
3
4
date
location
1
text
6
4
4
After some serious soul searching, I decided to use another column and filter by that.
In the end my query looks like this:
SELECT *
FROM (
(
SELECT RecordID, DecShotDate, DecShotLocation, DecReason, DecExplanation, Employee,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS, IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune, IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical, IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther, IIf([DecReason]=7,1,0) AS YesAlready
FROM CompletedTrainings WHERE (CompletedDate > DATE() - 365 ) AND (Training = 69)) AS T1
LEFT JOIN
(
SELECT ID, Active FROM EmployeeInformation) AS T2 ON T1.Employee = T2.ID)
LEFT JOIN
(
SELECT Reason, ReasonID FROM DeclinationReasons) AS T3 ON T1.DecReason = T3.ReasonID;
This may not have been the best solution, but it did exactly what I needed. Which is to get the information by latest entry into the database.
Previously I had tried to use MAX(), DISTINCT(), etc. but always had a problem of multiple records being retrieved. In this case, I intentionally SELECT the most recent records first, then join them to the results of the next query, and so on. Until I have all the required data for my report.
I write this in hopes someone else finds it useful. Or even better if someone tells me why this is wrong, so as to improve my own skills.
(this is homework, not going to lie)
I have an ANSI SQL query I wrote
this produces
the required
3rd highest prices correctly,
table sample is
select unique uni, price
from
(
(
select unique uni, price
from
(
select unique uni, price
from table1
group by uni
having price < max(price)
)
group by uni
having price < max(price)
)
group by uni
having price < max(price)
)
now i need to list the 1st, 2nd and 3rd into one table but make is such that it could be used nth times.
example:
Col1 Col2
uni1 10
uni1 20
uni2 20
uni2 10
uni3 30
uni3 20
uni1 30
/sorry for the formatting i havent been here for a very long time, i appreciate any assistance, i will supply a link to the uni of which i have asked the tutor if i can do so he said yes but not the whole code, something like 10%, but anyways./
In SAS you can use the proprietary option OUTOBS to restrict how many rows of a result set are output.
Example:
Use OUTOBS=3 to create top 3 table. Then use that table in a subsequent query.
data have;
input x ##; datalines;
10 9 8 7 6 5 4 3 2 1 0
;
proc sql;
reset outobs=3;
create table top3x as
select * from have
order by x descending;
reset outobs=max;
* another query;
quit;
my apologies if this is a duplicate but I could not find an answer to my particular question. I have a table that lists products on a sales order, and their various quantities. Some products are components for other products and are denoted so with a flag. I would like to know if there is a way to have a running total for the parent/normal items that would reset on each parent/normal item.
Here is an example of the table data and my desired output:
OrderNo Item Qty Regular Line
349443 AFU20451-KIT1 1 Y 1
349443 AFU20451 0 N 2
349443 HAWKE-14252 1 N 3
349443 RGPM-25H4 1 N 4
349443 AV-003-265 1 Y 5
349443 AV-A00090-KIT 1 Y 6
349443 AV-A00091 1 N 7
349443 AV-A00090 1 N 8
349443 AV-00043 1 N 9
349443 AV457/310GR/FP 2 Y 10
desired output:
OrderNo Item Qty
349433 AFU20451-KIT1 3
349433 AV-003-265 1
349433 AV-A00090-KIT 4
349433 AV457/310GR/FP 2
As you can see, I would like to reset the sum every time it says Y, only include the parent item (I could get around this as I can keep the order of the items the same, could maybe use row number). I have been trying to use Over and Partition by in order to do this, but to no avail. Let me know if this is even possible or if you need any further information.
with cte as
(
select OrderNo,
-- only return the main item
case when Regular = 'Y' then Item end AS Item,
Qty,
-- assign a unique number to each `YNNN..` component group
-- needed for GROUP BY in next step
sum(case when Regular = 'Y' then 1 else 0 end)
over (partition by OrderNo
order by Line
rows unbounded preceding) as grp
from myTable
)
select OrderNo,
-- find the matching value for the main component
max(Item),
sum(Qty)
from cte
group by OrderNo, grp
Current representation is against 1st Codd's rule.
Rule 1: The information rule: All information in a relational data
base is represented explicitly at the logical level and in exactly one
way – by values in tables.
But I believe you can still create FUNCTION/PROCEDURE and iterate row one by one with IF statement for Y/N. E.g. you create new table, IF Y - add new row to table, IF N - add +1 to QTY to latest row.
I would create two separate tables: manufacturer & part, to get the values so you don't have to hand-jam each inventory, or care about where they fall in the invoice list.
[1
[]2
Then, all you would need to do is compare the values to the part table to get this data. It's more work upfront, but will pay off to have this all saved and stored. A future sample query would look something like:
SELECT OrderNo.OrderTable, Item.OrderTable, Sum(Qty.OrderTable) AS Quantity
FROM OrderTable INNER JOIN Part ON OrderTable.Item = Table.PartName
GROUP BY OrderNo.OrderTable, Item.OrderTable, Regular.OrderTable, Part.ParentID;
try this:
select orderno, item, sum(qty) over(partition by regular order by regular)
from your_table
group by orderno, item, regular