Bigquery - Best way to transpose rows into multiple columns [duplicate] - sql

This question already has answers here:
How to: For each unique id, for each unique version, grab the best score and organize it into a table
(1 answer)
Change direction of table in BigQuery
(2 answers)
Closed 4 months ago.
I have data in below format
CustomerID
ID
Year
value
1000
1477
2022
True
1000
1477
2021
True
1000
1474
2022
Credit
1000
1474
2021
Debit
1000
1464
2022
Total Amount
1000
1464
2021
Net Amount
I would like to transpose this data for a particular Customer ID at each ID level for each year. Below is the expected Output
CustomerID
Year
ID_1477
ID_1474
ID_1464
1000
2022
True
Credit
Total Amount
1000
2021
True
Debit
Net Amount
Below is the query I have written to get this. I basically performed a self join and extracted the required elements into separate columns
SELECT
ele_1477.CustomerID,
ele_1477.Year,
ele_1477.value as ID_1477,
ele_1474.value as ID_1474,
ele_1464.value as ID_1464
FROM
(select * from table where id=1477 ) ele_1477
LEFT OUTER JOIN (select * from table where id=1474 ) ele_1474 ON ele_1477.CustomerID=ele_1474.CustomerID and ele_1477.Year=ele_1474.Year
LEFT OUTER JOIN (select * from table where id=1464 ) ele_1464 ON ele_1477.CustomerID=ele_1464.CustomerID and ele_1477.Year=ele_1464.Year
But my question here is, I am not sure how effective this query is. I have another 150 set of IDs for a CustomerID that needs to be transposed. Does that mean should i do a self join 150 times? Looking for a best possible solution to achieve this

You can use a PIVOT and a dynamic SQL for your purpose.
For your sample data, you can pivot your table like below.
SELECT *
FROM sample_table
PIVOT (ANY_VALUE(Value) ID FOR ID IN ('1477', '1474', '1464'));
I have another 150 set of IDs for a CustomerID that needs to be transposed.
And also you can use EXECUTE IMMEDIATE instead of writing down 150 IDs in your real data.
EXECUTE IMMEDIATE FORMAT("""
SELECT * FROM sample_table PIVOT (ANY_VALUE(Value) ID FOR ID IN ('%s'))
""", (SELECT STRING_AGG(DISTINCT ID, "','") FROM sample_table));
Query results of above two queries

Related

COUNT with multiple LEFT joins [duplicate]

This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?
The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.

SAS Proc SQL - ranking top nth (3rd) highest for a group of say universities and their price? (HW to be honest)

(this is homework, not going to lie)
I have an ANSI SQL query I wrote
this produces
the required
3rd highest prices correctly,
table sample is
select unique uni, price
from
(
(
select unique uni, price
from
(
select unique uni, price
from table1
group by uni
having price < max(price)
)
group by uni
having price < max(price)
)
group by uni
having price < max(price)
)
now i need to list the 1st, 2nd and 3rd into one table but make is such that it could be used nth times.
example:
Col1 Col2
uni1 10
uni1 20
uni2 20
uni2 10
uni3 30
uni3 20
uni1 30
/sorry for the formatting i havent been here for a very long time, i appreciate any assistance, i will supply a link to the uni of which i have asked the tutor if i can do so he said yes but not the whole code, something like 10%, but anyways./
In SAS you can use the proprietary option OUTOBS to restrict how many rows of a result set are output.
Example:
Use OUTOBS=3 to create top 3 table. Then use that table in a subsequent query.
data have;
input x ##; datalines;
10 9 8 7 6 5 4 3 2 1 0
;
proc sql;
reset outobs=3;
create table top3x as
select * from have
order by x descending;
reset outobs=max;
* another query;
quit;

Join two tables with additional field in one table [duplicate]

This question already has answers here:
SQL JOIN and different types of JOINs
(6 answers)
Closed 3 years ago.
I would like to join together two tables with additional columns.
First table is for number of products despatched by product
** Table 1 - Despatches **
Month ProductID No_despatched
Jan abc 10
Jan def 15
Jan xyz 12
The second table is for the number of products returned by product, but also an additional column by return reason
** Table 2 - Returns **
Month ProductID No_returned Return_reason
Jan abc 2 Too big
Jan abc 3 Too small
Jan xyz 1 Wrong colour
I would like to join the tables to show returns and despatched on the same row with the number of despatched being duplicated if there are multiple return reasons for the same product.
** Desired output **
Month ProductID No_despatched No_returned Return_reason
Jan abc 10 2 Too big
Jan abc 10 3 Too small
Jan xyz 12 1 Wrong colour
Hope this makes sense...
Thanks in advance!
afk
This seems like a basic JOIN:
select r.month, r.productid, d.no_despathed, r.no_returned, r.return_reason
from returns r join
despatches d
on r.month = d.month and r.productid = d.productid;
The results don't seem particularly useful, because some products are missing (those with no returns). And the amounts are duplicated if there is more than one return record.
just use join
select a.*,b.No_returned,.Return_reason from
table1 join table2 on a.ProductID=b.ProductID
and a.month=b.month
In case of duplicate you may use distinct
Changing the order of clauses in your question produces the result.
with additional columns.
SELECT Table1.Month, Table1.ProductID, Table1.NoDespatched, Table2.NoReturned, Table2.ReturnReason
join two tables
FROM Table1 LEFT JOIN Table2
ON Table1.Month=Table2.Month AND Table1.ProductID=Table2.ProductID
We use a LEFT JOIN because, presumably a product can be dispatched without being returned, but nobody can return a product you didn't send out.

Create a SQL report associating thousands of records

I'm a newbie in Oracle SQL and i'm trying to do the following:
I have a list of products with sales only on a certain weeks of the year.
I'm just triying to create a whole report of sales for the year and fill those weeks with no sales with zeroes, but preserving the product ID as identifier.
Trying to create that report, i'm starting trying to create a table with all the weeks of the year and de product ID, the problem is that there are thousands of products. I'm trying to get something like:
WEEK_YR PRODUCT_ID
------ ----------
2018011 Product 1
2018012 Product 1
...
2018053 Product 1
...
2018124 Product 1
...
2018011 Product 2
2018012 Product 2
...
2018053 Product 2
...
2018124 Product 2
And then just join this table with the sales table per week, putting 0 in those null weeks from that table.
I request your help at creating the first table, i just count with a table with the products ID and other with the list of weeks from last year
Thanks in advance!
I request your help at creating the first table...
You don't need to create a persistent table for that. Since Oracle 9.2 you can use CTEs (Common Table Expressions). In the example below the cte w is the one you are looking for:
with x as ( --- cte that generates numbers from 1 to 52
select 1 as n from dual
union all
select n + 1 from x where n < 52
),
w as ( --- cte that generates the week_yr
select 2018000 + n as week_yr from x
)
select -- main query now.
w.week_yr,
p...
from w
left join products p on p.week_yr = w.week_yr
where ...
group by ...
In the query above, the cte x generates numbers from 1 to 52. Then the cte w generates the week_yr values you need.
Finally, the main query can do what you want.

How calc Rank with this data in my database?

i have table that store questions each question have different answers and each answer have different weight and now i want to Calculation the rank but i don't now how do this.please help me
i use sql server
i have this table stored answers and weight of each answer
AdminQuesAns
=======================
Id QuesId Ans Value
10 1000 Yes 10
11 1000 somewhat 5
12 1000 No 0
10 1001 Yes 0
12 1001 No 10
and this table store Customer answers
AdminRank
==================================
Id SDId QuesId AnsValue
1 100 1000 10
2 100 1001 0
You can use the below query.
1.
Select SDId ,b.QuesId,
((sum(a.AnsValue) *100)/(Select sum(c.value)
from AdminQuesAns c where c.QuesId =b.QuesId))as'Rank'
from AdminRank a join AdminQuesAns b on a.QuesId=b.QuesId and value=AnsValue
group by SDId ,b.QuesId
This is how I'd go about it.
This has an inner query which gets the max value for each question, then the outer query pairs those with the values from the individual answers, sums across the questions and calculates one as a percentage of the other.
I'm also grouping by SDId, on the assumption that that is the ID of the person filling out the survey.
SELECT
ar.SDId,
100 * cast(sum(ar.AnsValue) as numeric(5,2)) / sum(mv.maxValue) as Rank
FROM
AdminRank ar
JOIN
(
SELECT
qa.QuesId,
max(qa.Value) as maxValue
FROM
AdminQuesAns qa
GROUP BY
qa.QuesId
) mv on ar.QuesId = mv.QuesId
GROUP BY
ar.SDId
Depending on your data types you may be able to remove the cast part.