How to create dynamic dummy variables in SQL - sql

I have a simple table with 2 columns: ID (integer) and Category (string), and each ID can repeat with a few categories, like so:
ID Cat
--- ---
1 A
1 B
2 B
3 A
3 B
3 C
I want to reshape this table so that each unique Category would be a dummy variable (0/1 if ID has it):
ID A B C
--- -- -- --
1 1 1 0
2 0 1 0
3 1 1 1
Now, if the set of unique categories is known (and small) this is an easy CASE WHEN statement x no. of unique categories.
My questions are:
a) What if it isn't unknown or really large? How do I create this 'CASE WHEN' effect automatically?
b) More importantly: I'm not necessarily interested in all categories (say only dummies for 'A' and 'B') but only categories which I have in a separate table called Cats, which is a simple 1 column holding my relevant categories (again, unknown how many), like:
Cat
---
A
B
How do I create dummy variables for only the categories in this dynamic table?
Do you think all of this should really be done in other tools e.g. R?
Thanks!
(I'm using Teradata SQL with SQLA, but I think it's a general SQL question)

Just use table:
table(dat)
Cat
ID A B C
1 1 1 0
2 0 1 0
3 1 1 1
and in case you want to have the binary table for a group of Cat:
table(subset(dat,Cat %in% c('A','B')))
Cat
ID A B
1 1 1
2 0 1
3 1 1

Related

Create a table with unknown columns SQL

I have a table that looks like this
ID
Steps
Letters
1
1
a
1
2
e
1
3
b
2
1
c
2
2
d
3
1
b
3
2
a
And a query that consists of the output
a
b
d
My goal is to create a table/ modify the first one to get rid of the letter column, and instead, have N additional columns (where N is the number of rows in the second query above) and the output is 1 if the last step for that ID was that specific letter, 0 if that letter was in any step, and NULL if it never was. Making a table like this
ID
a
b
d
1
0
1
NULL
2
NULL
NULL
1
3
1
0
NULL
I assume pivoting makes sense as a way to approach it, but I don't even know where to begin

Get row counts for different lookup values

A temp table has 700+ records with a PK. 12 columns contain Id values from lookup tables. Each lookup table has 4-8 records in it. How can I get a record count for each Id value in table LookupA that has a relationship via the PK to Id values in every other lookup table? Each lookup value in each lookup table needs to compared for a record count to every other lookup table and value.
I can write a SQL statement to get specific values for specific columns, but that's a long exercise and will slow down the proc.
Here's a sample of the data.
PK LookupA LookupB LookupC
1 1 1 3
2 1 2 3
3 1 3 2
4 2 4 2
5 4 1 1
6 3 2 1
7 2 3 3
8 4 4 3
9 4 3 2
10 1 1 2
The results need to compare LookupA with LookupB and LookupC to get a row count.
Table Value LookupB 1 2 3 4 LookupC 1 2 3
LookupA 1 2 1 1 0 0 2 2
2 0 0 1 1 0 1 1
3 0 1 0 0 1 0 0
4 1 0 1 1 1 1 1
Then LookupB would be compared to LookupA and LookupC.
And LookupC would be compared to LookupA and LookupB.
With this code you can get the numbers for all combinations of A,B and C in pairs:
select 'A-B' as Combination, LookupA, LookupB, count(*) as NumRecords
from table
group by Combination,LookupA, LookupB
UNION
select 'A-C' as Combination, LookupA, LookupC, count(*) as NumRecords
from table
group by Combination,LookupA, LookupC
UNION
select 'B-C' as Combination, LookupB, LookupC, count(*) as NumRecords
from table
group by Combination,LookupB, LookupC
After this, if you want to see all the values for LookupA comparing to B and C just
look for Combinations A-B and A-C
If I understand correctly, your temp table contains foreign keys to other tables, so why not simply use joins? Something like this.
SELECT COUNT(DISTINCT lookupA.id) as CountA
, COUNT(DISTINCT lookupB.id) as CountB
, etc...
FROM #temp_table t
LEFT OUTER JOIN lookupA a on a.id = t.lookupA
LEFT OUTER JOIN lookupB b on b.id = t.lookupB
...etc
I would suggest reviewing the design if possible. Having so many small tables complicates things, is it not possible to consolidate this and just have one lookup table? You could have an additional field "LookupType" and all the lookups could be in the same place which would make retrieval much simpler.
I used a slight derivative of the statement below without any UNIONs to get me where I wanted to go.
/*
select 'A-B' as Combination, LookupA, LookupB, count(*) as NumRecords
from table
group by Combination, LookupA, LookupB
*/
I used a variable and a WHILE loop to place the various summaries where they need to be.

SQL Server : how can I get difference between counts of total rows and those with only data

I have a table with data as shown below (the table is built every day with current date, but I left off that field for ease of reading).
This table keeps track of people and the doors they enter on a daily basis.
Table entrance_t:
id entrance entered
------------------------
1 a 0
1 b 0
1 c 0
1 d 0
2 a 1
2 b 0
2 c 0
2 d 0
3 a 0
3 b 1
3 c 1
3 d 1
My goal is to report on people and count entrances not used(grouping on people), but ONLY if they entered(entered=1).
So using the above table, I would like the results of query to be...
id count
----------
2 3
3 1
(id=2 did not use 3 of the entrances and id=3 did not use 1)
I tried queries(some with inner joins on two instances of same table) and I can get the entrances not used, but it's always for everybody. Like this...
id count
----------
1 4
2 3
3 1
How do I not display results id=1 since they did not enter at all?
Thank you,
You could use conditional aggregation:
SELECT id, count(CASE WHEN entered = 0 THEN 1 END) AS cnt
FROM entrance_t
GROUP BY id
HAVING count(CASE WHEN entered = 1 THEN 1 END) > 0;
DBFiddle Demo

Update table column that is used for ordering according to alphabetical order on a second table

So I have two tables (A and B) that have a relation of n-n.
So there is a third table (C) that is used to connect both tables.
Table A and B both have an Id and a name.
Table C has IDA, IDB and an Order, the number that is used to sort and that is user given.
My issue is that I need to migrate table C since I just added that order column and so I need to give every line an ordering number, according to the B name.
So if table A has:
Id Name
1 A
2 B
3 C
And Table B has:
Id Name
1 J
2 L
3 M
And table C has:
IdA IdB Order
1 2 0
1 1 0
1 3 0
2 1 0
2 3 0
I need a query that updates table C to be more like:
IdA IdB Order
1 2 2
1 1 1
1 3 3
2 1 1
2 3 2
I have a query that can basically do what i want but it leaves me with "gaps"
reading my results above i get:
IdA IdB Order
1 2 2
1 1 1
1 3 3
2 1 1
2 3 3
I think this should work for what you need:
With ToUpdate As
(
Select C.*,
Row_Number() Over (Partition By C.IdA Order By B.Name) As NewOrder
From C
Join B On B.Id = C.IdB
)
Update C
Set "Order" = U.NewOrder
From ToUpdate U
Where U.IdA = C.IdA
And U.IdB = C.IdB
(In full disclosure, I'm not terribly familiar with postgres, but I think this should be valid).

How to get an index of different category returned by "order by" sql in oracle?

We can easily get a sql result as following:
SQL>select Name, Value from table order by Name;
Name Value
------------
A 1
A 2
B 1
C 5
C 6
C 7
However, is there a way to link the name to a number so that an index of different names can be formed? Suppose we don't know how many different names are in the table and don't know what they are.
Name Value idx
-----------------
A 1 0
A 2 0
B 1 1
C 5 2
C 6 2
C 7 2
This can easily be done using a window function:
select Name,
Value,
dense_rank() over (order by name) - 1 as idx
from table
order by Name;