Select distinct combinations of values - sql

I have a table with X values and Y values, both INT. What I want to do is group on the X value with the condition that it contains a distinct combination of Y values. I also want to see the total number of each combination.
I tried using SUM ( POWER (2, Y)), but that generates numbers that are too big as Y can get up to about 300 in some cases.
+--------------+--------------+
| X | Y |
+--------------+--------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 4 |
| 1 | 6 |
| 2 | 1 |
| 2 | 2 |
| 2 | 4 |
| 2 | 6 |
| 3 | 2 |
| 3 | 3 |
| 3 | 5 |
| 4 | 2 |
| 4 | 3 |
| 4 | 5 |
| 5 | 2 |
| 5 | 3 |
| 5 | 6 |
+--------------+--------------+
I want the result to look something like:
+--------------+--------------+
| X | COUNT |
+--------------+--------------+
| 1 | 2 |
| 3 | 2 |
| 5 | 1 |
+--------------+--------------+

Based on your description (but not on your sample data) next query should do:
select X, count(distinct Y)
from TBL
group by X

Thanks for trying to help. I realize that it might have been hard to understand what I was trying to do.
Anyway, I ended up solving it with the checksum_agg aggregate function.

Related

Oracle SQL unpivot and keep rows with null values [duplicate]

This question already has an answer here:
oracle - querying NULL values in unpivot query
(1 answer)
Closed 2 years ago.
I'm currently doing an unpivot for a Oracle Data Source (v.12.2) like this:
SELECT *
FROM some_table
UNPIVOT (
(X,Y,Val)
FOR SITE
IN (
(SITE1_X, SITE1_Y, SITE1_VAL) AS '1',
(SITE2_X, SITE2_Y, SITE2_VAL) AS '2',
(SITE3_X, SITE3_Y, SITE3_VAL) AS '3'
))
This works totally fine so far. There is only one exception - I have another column, let's say extend_info, ... if this column has the value y, there will be only one row of this column and all the site columns will be null. Nevertheless I would like to keep this row and not drop it.
I'm not really sure how to do this or what would be a nice way to do this. Any recommendations?
Example:
Original Table:
ID | SITE1_X | SITE1_Y |SITE1_VAL | SITE2_X | SITE2_Y | SITE2_VAL | ... | extend_info
-------
1 | 0 | 0 | 5 | 1 | 1 | 10 | ... | n
2 | 0 | 0 | 3 | null | null | null | ... | n
3 | null | null | null | null | null | null | ... | y
current output:
ID | SITE | X | Y | VAL | extend_info
-------
1 | 1 | 0 | 0 | 5 | n
2 | 1 | 1 | 1 | 10 | n
3 | 2 | 0 | 0 | 3 | n
desired output:
ID | SITE | X | Y | VAL | extend_info
-------
1 | 1 | 0 | 0 | 5 | n
2 | 1 | 1 | 1 | 10 | n
3 | 2 | 0 | 0 | 3 | n
4 | | | | | y
I don't really care what is in SITE|X|Y|VAL in that case, can be 0 for everything or null.
Bonus question:
If extend_info is y I would like to join another table with this ID. The other table looks like this:
ID | F_ID | X | Y | VAL
-----
1 | 4 | 1 | 1 | 8
2 | 4 | 2 | 2 | 9
and in that case my final output table should look like:
ID | SITE | X | Y | VAL | X_OTHER_TABLE | Y_OTHER_TABLE
-------
1 | 1 | 0 | 0 | 5 |
2 | 1 | 1 | 1 | 10 |
3 | 2 | 0 | 0 | 3 |
4 | | | | 8 | 1 | 1
5 | | | | 9 | 2 | 2
I know... the database structure is super ugly but that is what a vendor provides us and we are trying to create a View to make it easier to perform some data analysis tasks on it.
It doesn't have to look 1:1 like my final example - but maybe my itention gets clear = I want to have one single table/view with all the information in a single format.
Thanks for any help!
I would recommend a lateral join:
SELECT s.id, u.*
FROM some_table s CROSS JOIN LATERAL
(SELECT s.SITE1_X as SITE_X, s.SITE1_Y as SITE_Y, s.SITE1_VAL as SITE_VAL FROM DUAL UNION ALL
SELECT s.SITE2_X, s.SITE2_Y, s.SITE2_VAL FROM DUAL UNION ALL
SELECT s.SITE3_X, s.SITE3_Y, s.SITE3_VAL FROM DUAL
) u;
You can just join additional tables to this as you like.

SQL - Create number of categories based on pre-defined number of splits

I am using BigQuery, and trying to assign categorical values to each of my records, based on the number of 'splits' assigned to it.
The table has a cumulative count of records, grouped at the STR level - i.e., if there are 4 SKUs at 2 STR, the SKUs will be labeled 1,2,3,4. Each STR is assigned a SPLIT value, so if the STR has a SPLIT value of 2, I want it to split its SKUs into 2 categories. I want to create another column that would assign SKUs labeled 1-2 as '1', and SKUs labeled 3-4 as '2'. (The actual data is on a much larger scale, but thought this would be easier.)
+-----+------+---------------+--------+
| STR | SKU | SKU_ROW_COUNT | SPLITS |
+-----+------+---------------+--------+
| 1 | 1230 | 1 | 3 |
| 1 | 1231 | 2 | 3 |
| 1 | 1232 | 3 | 3 |
| 1 | 1233 | 4 | 3 |
| 1 | 1234 | 5 | 3 |
| 1 | 1235 | 6 | 3 |
| 2 | 1310 | 1 | 2 |
| 2 | 1311 | 2 | 2 |
| 2 | 1312 | 3 | 2 |
| 2 | 1313 | 4 | 2 |
| 3 | 2345 | 1 | 1 |
| 3 | 2346 | 2 | 1 |
| 3 | 2347 | 3 | 1 |
+-----+------+---------------+--------+
The SPLITS column is dynamic, ranging from 1 to 3. The number of SKUs in each category should be relatively equal, but that's not a priority as much as just the number of groups that are created. Ideally, the final table with the new column (HOST_NUMBER) would look something like this:
+-----+------+---------------+--------+-------------+
| STR | SKU | SKU_ROW_COUNT | SPLITS | HOST_NUMBER |
+-----+------+---------------+--------+-------------+
| 1 | 1230 | 1 | 3 | 1 |
| 1 | 1231 | 2 | 3 | 1 |
| 1 | 1232 | 3 | 3 | 2 |
| 1 | 1233 | 4 | 3 | 2 |
| 1 | 1234 | 5 | 3 | 3 |
| 1 | 1235 | 6 | 3 | 3 |
| 2 | 1310 | 1 | 2 | 1 |
| 2 | 1311 | 2 | 2 | 1 |
| 2 | 1312 | 3 | 2 | 2 |
| 2 | 1313 | 4 | 2 | 2 |
| 3 | 2345 | 1 | 1 | 1 |
| 3 | 2346 | 2 | 1 | 1 |
| 3 | 2347 | 3 | 1 | 1 |
+-----+------+---------------+--------+-------------+
You can use window functions and arithmetics:
select
t.*,
1 + floor((sku_row_count - 1) * splits / count(*) over(partition by str)) host_number
from mytable t
order by sku
Actually, ntile() seems to do exactly what you want - and you don't even need the sku_row_count column (which basically mimics row_number() anyway):
select
t.*,
ntile(splits) over(partition by str order by sku) host_number
from mytable t
order by sku
If the ordering of the values in the groups doesn't matter, just use modulo arithmetic:
select t.*, (SKU_ROW_COUNT % SPLITS) as split_group
from t
Below is for BigQuery Standard SQL
#standardSQL
SELECT *, 1 + MOD(SKU_ROW_COUNT, SPLITS) AS HOST_NUMBER
FROM `project.dataset.table`

SQL - aggregation with column value as column name

For a table like below need to do an aggregation such that for each unique field in one column, need to find the count of occurrences of a discrete value in another column
input table is:
id model datetime driver distance
---|-----|------------|--------|---------
1 | S | 04/03/2009 | john | 399
2 | X | 04/03/2009 | juliet | 244
3 | 3 | 04/03/2009 | borat | 555
4 | 3 | 03/03/2009 | john | 300
5 | X | 03/03/2009 | juliet | 200
6 | X | 03/03/2009 | borat | 500
7 | S | 24/12/2008 | borat | 600
8 | X | 01/01/2009 | borat | 700
Output required
model john juliet | borat
-----|--------|-------|------
S | 1 | 0 | 1
X | 0 | 2 | 2
3 | 1 | 0 | 1
one potential way to do is to group by model with an aggregation like
SUM (CASE WHEN driver = 'value' THEN 1 ELSE 0 END) AS value for each discrete value of driver column. But the challenge is sometimes the number of discrete values is too many ( around 50 in my case) or in some cases do not even know all possible discrete values - I was wondering if there is an alternate way to do this.
The aggregation part need a litle more work.
Here the details:
Need calculate first what are all the combinations
Then use LEFT JOIN to get which combination doesnt have data.
DEMO
WITH "allDrivers" as (
SELECT DISTINCT "driver"
FROM Table1
),
"allModels" as (
SELECT DISTINCT "model"
FROM Table1
),
"source" as (
SELECT d."driver", m."model"
FROM "allDrivers" d
CROSS JOIN "allModels" m
)
SELECT s."model", s."driver", COUNT(t."datetime")
FROM "source" s
LEFT JOIN table1 t
ON s."model" = t."model"
AND s."driver" = t."driver"
GROUP BY s."model", s."driver"
OUTPUT
| model | driver | count |
|-------|--------|-------|
| 3 | borat | 1 |
| 3 | john | 1 |
| 3 | juliet | 0 |
| S | borat | 1 |
| S | john | 1 |
| S | juliet | 0 |
| X | borat | 2 |
| X | john | 0 |
| X | juliet | 2 |
Then you can do the dynamic pivot

Limit a sorted number of rows joined

I have two tables, A and B, and a join table M. I want to, for each A.id, get the top 2 B.id's sorting on the value in table M, producing the results below. This is running on an Azure SQL database
Table A Table M Table B
+-----+ +-----+-----+-------+ +-----+
| Id | | AId | BId | Value | | Id |
+-----+ +-----+-----+-------+ +-----+
| 1 | | 1 | 3 | 4 | | 1 |
| 2 | | 1 | 2 | 3 | | 2 |
| 3 | | 3 | 2 | 3 | | 3 |
| 4 | | 3 | 5 | 6 | | 4 |
+-----+ | 3 | 3 | 4 | | 5 |
| 4 | 1 | 2 | +-----+
| 4 | 2 | 1 |
| 4 | 4 | 3 |
+-----+-----+-------+
Result
+-----+-----+-------+
| AId | BId | Value |
+-----+-----+-------+
| 1 | 3 | 4 |
| 1 | 2 | 3 |
| 3 | 5 | 6 |
| 3 | 3 | 4 |
| 4 | 1 | 2 |
| 4 | 4 | 3 |
+-----+-----+-------+
I know that I can select all the M.AId rows where they equal 1, sort it, and limit by 2, but I need to do this for every row in Table A. I've made an attempt to use group by, but I wasn't sure how to sort and limit it. I've also tried to search for resources associated with this issue but I couldn't find any resources.
(I also wasn't sure how to word the title for this issue)
You can just use ROW_NUMBER:
SELECT
AId, BId, Value
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY AId ORDER BY Value DESC)
FROM M
) t
WHERE Rn <= 2

SQL - only display rows that have the max value

I have this table that is already sorted but I want it to only display the maximum values... so instead of this table:
+------+-------+
| id | value |
+------+-------+
| 1 | 3 |
| 5 | 3 |
| 4 | 3 |
| 9 | 2 |
| 8 | 2 |
| 3 | 2 |
| 2 | 1 |
| 6 | 1 |
| 7 | 1 |
+------+-------+
I want this:
+------+-------+
| id | value |
+------+-------+
| 1 | 3 |
| 5 | 3 |
| 4 | 3 |
+------+-------+
I'm using SQLite. thanks for any help.
You can do this using a subquery. Here is one way:
select t.*
from t
where t.value = (select max(value) from t);