Select unique combinations (unique on both sides) - sql

EDIT: added a link to Fiddle for a more comprehensive sample (actual dataset)
I wonder if the below is possible in SQL, in BigQuery in particular, and in one SELECT statement.
Consider following input:
Key | Value
-----|-------
a | 2
a | 3
b | 2
b | 3
b | 5
c | 2
c | 5
c | 7
Logic: select the lowest value "available" for each key. Available meaning not yet assigned/used. See below.
Key | Value | Rule
-----|-------|--------------------------------------------
a | 2 | keep
a | 3 | ignore because key "a" has a value already
b | 2 | ignore because value "2" was already used
b | 3 | keep
b | 5 | ignore because key "b" has a value already
c | 2 | ignore because value "2" was already used
c | 5 | keep
c | 7 | ignore because key "c" has a value already
Hence expected outcome:
Key | Value
-----|-------
a | 2
b | 3
c | 5
Here the SQL to create the dummy table:
with t as ( select
'a' key, 2 value UNION ALL select 'a', 3
UNION ALL select 'b', 2 UNION ALL select 'b', 3 UNION ALL select 'b', 5
UNION ALL select 'c', 2 UNION ALL select 'c', 5 UNION ALL select 'c', 7
)
select * from t
EDIT: here another dataset
Not sure what combination of FULL JOIN, DISTINCT, ARRAY or WINDOW functions I can use.
Any guidance is appreciated.

EDIT: This is an incorrect answer that worked with the original example dataset, but has issues (as seen with comprehensive sample). I'm leaving it here for now to maintain comment history.
I don't have a specific BigQuery answer, but here is one SQL solution using a Common Table Expression and recursion.
WITH MyCTE AS
(
/* ANCHOR SUBQUERY */
SELECT MyKey, MyValue
FROM MyTable t
WHERE t.MyKey = (SELECT MIN(MyKey) FROM MyTable)
UNION ALL
/* RECURSIVE SUBQUERY */
SELECT t.MyKey, t.MyValue
FROM MyTable t
INNER JOIN MyCTE c
ON c.MyKey < t.MyKey
AND c.MyValue < t.MyValue
)
SELECT MyKey, MIN(MyValue)
FROM MyCTE
GROUP BY MyKey
;
Results:
Key | Value
-----|-------
a | 2
b | 3
c | 5
SQL Fiddle

Related

Is there a version of 'CONTAINS' function in SQLITE other than 'LIKE'?

I'm trying to find totals for each number in the range of 1 to 7. But the data contains different combinations of these numbers. For e.g. 1; 2; 3,7; 1,2,3 and so on. I want to find the total number of times each number pops up. What I essentially want is a code for SQLite that's goes like:
select <fields>, count(*)
from tablexyz
where <field> contains '2' (and '3','4',... individually)
When I input "where like '2%'" and such, it only gives me all series that start with 2 but negates series that starts with 1 but contains 2.
Any help would be appreciated!
I want to find the total number of times each number pops up
Your sample code and the solution you say you want don't exactly align. The closest I can think of is
with t (txt) as -- a sample record from your table
(select '1; 2; 3,7; 1,2,3'),
t2 (num) as -- a lookup table we can create for range of numbers 1-7
(select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7)
select t2.num, length(t.txt) - length(replace(t.txt,t2.num,'')) as num_occurence
from t2
left join t on t.txt like '%' || t2.num || '%'
Outputs
+-----+---------------+
| num | num_occurence |
+-----+---------------+
| 1 | 2 |
| 2 | 2 |
| 3 | 2 |
| 4 | NULL |
| 5 | NULL |
| 6 | NULL |
| 7 | 1 |
+-----+---------------+
Demo
Using the solution below, you can build a "table" of the numbers 1 to 7, then join it to your source data table to count if the number occurs in that row, then sum it together.
Query
WITH
sample_data (nums)
AS
(SELECT '1,2,3,4,5,6'
UNION ALL
SELECT '3,4,5,6'
UNION ALL
SELECT '1,2,7,6'
UNION ALL
SELECT '6' ),
search_nums (search_num)
AS
(VALUES(1)
UNION ALL
SELECT search_num+1 FROM search_nums WHERE search_num<7)
select search_num, sum(count_of_num) from (
SELECT s.nums,
n.search_num,
case
instr(s.nums, n.search_num)
when 0 then 0
else 1
end as count_of_num
FROM sample_data s, search_nums n
) group by search_num;
Result
search_num sum(count_of_num)
1 2
2 2
3 2
4 2
5 2
6 4
7 1

SQL/HIVE - How do I query from a horizontal output to vertical output?

I have a query that returns two rows with the information needed
SELECT src_file_dt, a, b ,c FROM my_table WHERE src_file_dt IN ('1531675040', '1531675169');
it will return:
src_file_dt | a | b | c
1531675040 | 2 | 6 | 9
1531675169 | 8 | 2 | 0
Now, I need the data in the following layout, how do I get it like this output:
fields | prev (1531675040) | curr (1531675169)
a | 2 | 8
b | 6 | 2
c | 9 | 0
There should be easier way to achieve that.
I've build results using explode function and selecting separately data for prev and next columns:
select t1.keys, t1.vals as prev, t2.vals as next from (
SELECT explode(map('a', a, 'b', b, 'c',c)) as (keys, vals)
FROM my_table
WHERE src_file_dt = '1531675040'
) t1,
(
SELECT explode(map('a', a, 'b', b, 'c',c)) as (keys, vals)
FROM my_table
WHERE src_file_dt = '1531675169'
) t2
where t1.keys = t2.keys
;

Group by 3 columns: "Each group by expression must contain at least one column that is not an outer reference"

I know questions regarding this error message have been asked already, but I couldn't find any that really fit my problem.
I have a table with three columns (A,B,C) containing different values and I need to identify all the identical combination. For example out of "TABLE A" below:
| A | B | C |
| 1 | 2 | 3 |
| 1 | 3 | 3 |
| 1 | 2 | 3 |
| 2 | 2 | 2 |
| 1 | 3 | 3 |
... I would like too get "TABLE B" below:
| A | B | C | count |
| 1 | 2 | 3 | 1 |
| 1 | 3 | 3 | 1 |
| 2 | 2 | 2 | 1 |
(I need the last column "count" with 1 in each row for later usage)
When I try with "group by A,B,C" I get the error mentioned in the title. Any help would be greatly appreciated!
FYI, I don't think it really changes the matter, but "TABLE A" is obtained from an other table: "SOURCE_TABLE", thanks to a query of the type:
select (case when ... ),(case when ...),(case when ...) from SOURCE_TABLE
and I need to build "TABLE B" with only one query.
i think what you are after of is using distinct
select distinct A,B,C, 1 [count] -- where 1 is a static value for later use
from (select ... from sourcetable) X
Sounds like you have the right idea. My guess is that the error is occurring due to an outer reference in your CASE statements. If you wrapped your first query in another query, it may alleviate this issue. Try:
SELECT A, B, C, COUNT(*) AS [UniqueRowCount]
FROM (
SELECT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C FROM SOURCE_TABLE
) AS Subquery
GROUP BY A, B, C
After re-reading your question, it seems that you're not counting at all, just putting a "1" after each distinct row. If that's the case, then you can try:
SELECT DISTINCT A, B, C, [Count]
FROM (
SELECT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C, 1 AS [Count] FROM SOURCE_TABLE
) AS Subquery
Assuming your outer reference exceptions were occurring in only your aggregations, you should also simply try:
SELECT DISTINCT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C, 1 AS [Count] FROM SOURCE_TABLE

SQL Select a group when attributes match at least a list of values

Given a table with a (non-distinct) identifier and a value:
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
How can you select the grouped identifiers, which have values for a given list? (e.g. ('B', 'C'))
This list might also be the result of another query (like SELECT Value from Table1 WHERE ID = '2' to find all IDs which have a superset of values, compared to ID=2 (only ID=1 in this example))
Result
| ID |
|----|
| 1 |
| 2 |
1 and 2 are part of the result, as they have both A and B in their Value-column. 3 is not included, as it is missing C
Thanks to the answer from this question: SQL Select only rows where exact multiple relationships exist I created a query which works for a fixed list. However I need to be able to use the results of another query without changing the query. (And also requires the Access-specific IFF function):
SELECT ID FROM Table1
GROUP BY ID
HAVING SUM(Value NOT IN ('A', 'B')) = 0
AND SUM(IIF(Value='A', 1, 0)) = 1
AND SUM(IIF(Value='B', 1, 0)) = 1
In case it matters: The SQL is run on a Excel-table via VBA and ADODB.
In the where criteria filter on the list of values you would like to see, group by id and in the having clause filter on those ids which have 3 matching rows.
select id from table1
where value in ('A', 'B', 'C') --you can use a result of another query here
group by id
having count(*)=3
If you can have the same id - value pair more than once, then you need to slightly alter the having clause: having count(distinct value)=3
If you want to make it completely dynamic based on a subquery, then:
select id, min(valcount) as minvalcount from table1
cross join (select count(*) as valcount from table1 where id=2) as t1
where value in (select value from table1 where id=2) --you can use a result of another query here
group by id
having count(*)=minvalcount

Remove partial duplicates sql server

I am altering an existing view within SQL Server. My union statement creates something along the lines of:
Col1 | C2 | C3 | C4
-----|----|------|-----
1 A | B | NULL | NULL
2 A | B | C | NULL
3 A | B | C | D
4 E | F | NULL | NULL
5 E | F | G | NULL
However, I only want (in this scenario) rows 3 and 5 (I need to ommit one and two because they contain duplicate info - columns one, two, and three contain the same info as row three, but the third row is the most 'complete'). Row 5 for the same reason vs row 4.
Is this an outer join / intersect issue? How the heck do you create a view in this manner?
Assuming that Col1 is not NULL, then we can use ROW_NUMBER with order by on all 4 columns total value
; with cte
AS
(
select ROW_NUMBER() over ( partition by col1 order by (coalesce(Col1,'')+
coalesce([C2],'') +
coalesce([C3],'') +
coalesce([C4],'') ) desc) as seq,
*
FROM Table1
)
select * from cte
where seq =1