Create dynamic binary columns in sql query - sql

I'm using presto db
I have two tables, one looks like:
table 1:
item count
p1 20
p2 10
p3 5
p4 4
p5 2
and table 2:
person lic
c1 p2
c1 p1
c2 p3
c2 p4
c2 p2
c3 p1
c4 p2
I want to return a table that looks like:
person p1 p2 p3 p4 p5
c1 1 1 0 0 0
c2 0 1 1 1 0
c3 1 0 0 0 0
c4 0 1 0 0 0
c5 0 0 0 0 0
It looks like a pivot would do, but im not sure how to account for missing values in the column and get them to be '0' in the final table

The output schema for a SQL query must be fixed. Thus, if you want a column p1 to appear in the output, it has to be listed explicitly in the query.
I'm not sure how table1 is related to the output, but you can do a pivot like this:
SELECT person
, count_if(lic = 'p1') p1
, count_if(lic = 'p2') p2
...
FROM table2
GROUP BY person
The query needs to list each p column. Depending on your application, you might be able to generate the query programmatically by first running a query to get the unique values of p.

Related

Dynamically selecting the column to select from the row itself in SQL

I have a SQL Server table with some data as follows. The number of P columns are fixed but there will be too many columns. There will be multiple columns in the fashion like S1, S2 etc
Id
SelectedP
P1
P2
P3
P4
P5
1
P2
3
8
4
15
7
2
P1
0
2
6
0
3
3
P3
1
15
2
1
11
4
P4
3
4
6
2
4
I need to write a SQL statement which can get the below result. Basically which column that needs to be selected from each row depends upon the SelectedP value in that row itself. The SelectedP contains the column to select for each row.
Id
SelectedP
Selected-P-Value
1
P2
8
2
P1
0
3
P3
2
4
P4
2
Thanks in advance.
You just need a CASE expression...
SELECT
id,
SelectedP,
CASE SelectedP
WHEN 'P1' THEN P1
WHEN 'P2' THEN P2
WHEN 'P3' THEN P3
WHEN 'P4' THEN P4
WHEN 'P5' THEN P5
END
AS SelectedPValue
FROM
yourTable
This will return NULL for anything not mentioned in the CASE expression.
EDIT:
An option with just a little less typing...
SELECT
id, SelectedP, val
FROM
yourTable AS pvt
UNPIVOT
(
val FOR P IN
(
P1,
P2,
P3,
P4,
P5
)
)
AS unpvt
WHERE
SelectedP = P
NOTE: If the value of SelectedP doesn't exist in the UNPIVOT, then the row will not appear at all (unlike the CASE expression which will return a NULL)
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b693738aac0b594cf37410ee5cb15cf5
EDIT 2:
I don't know if this will perform much worse than the 2nd option, but this preserves the NULL behaviour.
(The preferred option is still to fix your data-structure.)
SELECT
id, SelectedP, MAX(CASE WHEN SelectedP = P THEN val END) AS val
FROM
yourTable AS pvt
UNPIVOT
(
val FOR P IN
(
P1,
P2,
P3,
P4,
P5
)
)
AS unpvt
GROUP BY
id, SelectedP
Demo : https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=f3f64d2fb6e11fd24d1addbe1e50f020

Can anyone tell me, how inner join works in SQL?

I have a table A and B, both has one column and two rows as follows,
A B
-------
C1 C2
1 1
1 1
1 0
0 0
If I apply a inner join on this and the it is returning 8 rows as a results,
Select C1,C2 from A inner join B on A.C1=B.C2;
Result
---------
C1 C2
1 1
1 1
1 1
1 1
1 1
1 1
0 0
0 0
And I am guessing that, first row of the C1 column is checking with the data of all C2 columns. If it's matches, its returning result otherwise, it won't. The same method was following for the rest of the columns. Correct me, if my understanding was wrong and please assist with the answer for the below question;
I have two tables A,B both are having two columns. What will be the result if we apply inner join, please explain me with functionality.
A B
----–-----------
C1 C2 C3 C4
1 1 1 1
1 1 1 0
Select C1,C2,C3,C4 from A inner join B on A.C1=B.C3;
It's returning 4 rows, please explain how?
This is common misconception about inner joins. Concept of inner join says a value in a column of a table will match with each and every occurrence of same value of joining column in another table.
In your example, in table A First row of 1 of column C1 will match with all 2 rows of value 1 of column C2 of table B, Second 1 will match with all 2 1's then 3rd 1 will match with all 2 of able B. Then a 0 will match 2 times in table B.
Thus they becomes - 2(1's) + 2(1's) + 2(1's) + 2(0's) = 8 rows.
Same concept applies to your second example as well. Since you have 2 columns in your 2nd example, So you have to decide the join predicate here.
If you decided to join like `A.C1 = B.C3` then 4 rows will occur in result.
If you decided to join like `A.C1 = B.C4` then 2 rows will occur in result.
If you decided to join like `A.C2 = B.C3` then 4 rows will occur in result.
If you decided to join like `A.C2 = B.C4` then 2 rows will occur in result.
In your example, if you use the predicate A.C1 = B.C3 the result is:
c1 c2 c3 c4
--- --- --- --
1 1 1 1
1 1 1 1
1 1 1 0
1 1 1 0
See running example at DB Fiddle.
Now, as a general rule, the inner join will match rows from both tables according to any predicate you specify, not necessarily simple column values.
For example:
A B
-------- --------
C1 C2 # C3 C4 #
1 1 A1 1 1 B1
1 1 A2 1 0 B2
0 1 B3
If the predicate is a.c1 * a.c2 = b.c3 + b.c4, as in the query:
select
a.*,
b.*
from a
join b on a.c1 * a.c2 = b.c3 + b.c4
The result is:
c1 c2 c3 c4 matching predicate
--- --- --- -- --------------------------
1 1 1 0 1 * 1 = 1 + 0 (A1 and B2)
1 1 1 0 1 * 1 = 1 + 0 (A2 and B2)
1 1 0 1 1 * 1 = 0 + 1 (A1 and B3)
1 1 0 1 1 * 1 = 0 + 1 (A2 and B3)
Do you see how the rows are matched?

Presto SQL - product combination matrix to find substitutes

I have this input (product list table and its buyers)
product_id customer_id
p1 c1
p2 c1
p2 c2
p3 c1
p3 c2
p4 c2
and need to get this dynamic matrix output in SQL, Presto.
p1 p2 p3 p4
p1 2 1 0
p2 2 1
p3 1
p4
any thoughts are very appreciated

Take the max value of a column in a sql table

I have this query:
SELECT DISTINCT S.PRODOTTO, D.CODPROD, D.IDPROD
FROM D_PROD D, APP_SALES S
WHERE D.CODPROD = S.PRODOTTO
The result is:
PRODOTTO CODPROD IDPROD
P2 P2 2
P1 P1 1
P3 P3 4
P3 P3 3
Now I would the result was
PRODOTTO CODPROD IDPROD
P2 P2 2
P1 P1 1
P3 P3 4
with the product P3 that take the max idprod it has encountered.
How can I say to the query to take the max value if there are more rows of one product?
I want the max idprod.
SELECT DISTINCT S.PRODOTTO, D.CODPROD, MAX(D.IDPROD)
FROM D_PROD D, APP_SALES S
WHERE D.CODPROD = S.PRODOTTO
GROUP BY S.PRODOTTO, D.CODPROD

Row value inconsistency

Scenario -
We have pack items, which is defined as a composite of one or more items. A complex pack is a one that has more than one component items. Each component item of a complex pack item should be linked to equal number of locations.
For example:
Pack P1 has component C1, C2, and C3. Each item C1,C2 and C3 is ranged to 10 locations 1,2....10, such that C1-1,C1-2,...,C1-10,C2-1,C2-2,...,C2-10,and C3-1,C3-2,...,C3-10 exists. In such case the pack item P1 also gets associated to locations 1 through 10, as P1-1,P1-2,...,P1-10.
The table PACK_BREAKOUT contains the Pack component mapping and the table ITEM_LOCATION contains the items to location association. Both Pack and Component are considered as "items" and would exist in ITEM_LOCATION.
Ideally, for the a scenario like above the below record-set would be valid
PACK_NO ITEM NO_OF_LOC
-------- ------ -------------
P1 C1 10
P1 C2 10
P1 C3 10
I have the query below that returns result like above for all such pack items.
select c.pack_no,c.item,count(a.loc )
from item_location a, pack_breakout c
where c.item=a.item
group by c.pack_no,c.item
order by 1,2;
However, there are some discrepant results like pack no. P2 , P4, and P5 below where the components are not associated with equal number of locations.
PACK_NO ITEM NO_OF_LOC
-------- ------ -------------
P1 C1 10
P1 C2 10
P1 C3 10
P2 C1 11
P2 C2 5
P2 C3 9
P2 C4 11
P3 C1 21
P3 C2 21
P3 C3 21
P3 C4 21
P3 C5 21
P4 C1 10
P4 C2 15
P5 C1 10
P5 C2 9
P5 C3 10
P5 C4 10
Note that a pack can have n-number of components (as you can see P1, P2, P3, P4, and P5 have different number of components).
I would like to get only the packs whose component locations are not all consistent. So the desired result set would be-
PACK_NO ITEM NO_OF_LOC
-------- ------ -------------
P2 C1 11
P2 C2 5
P2 C3 9
P2 C4 11
P4 C1 10
P4 C2 15
P5 C1 10
P5 C2 9
P5 C3 10
P5 C4 10
Note that even if one component does not match no. of locations as the other components within the pack, the entire pack must be considered inconsistent (like P5).
You want to use another group by with a having clause:
select pack_no
from (select c.pack_no, c.item, count(a.loc ) as numlocs
from item_location a join
pack_breakout c
on c.item=a.item
group by c.pack_no, c.item
) p
group by pack_no
having MIN(numlocs) <> MAX(numlocs)
This returns the packs.
If you want the details of the numbers, then use the analytic functions for the calculation:
select pi.*
from (select pi.*, min(numlocs) over (partition by pack_no) as minnumlocs,
max(numlocs) over (partition by packno) as maxnumlocs
from (select c.pack_no, c.item, count(a.loc ) as numlocs
from item_location a join
pack_breakout c
on c.item=a.item
group by c.pack_no, c.item
) pi
) pi
where minnumlocs <> maxnumlocs