how to select columns by row and value in postgres? - sql

I got a table like this, values are all booleans, except for col1, these are the rownames (the primary-key):
col1 | col2 | col3 | col4 | col5 ...
--------------------------------
row1 | f | t | t | t
row2 | f | f | f | t
row3 | t | f | t | f
:
And I want a query like this: select all columns for row3 where value=t, or, perhaps more precisely: select all column-names for row3 where value=t.
In this example the answer should be:
col2
col4
Because I know all column-names I can do it by recursion in the caller, I mean e.g. by calling the postgres-client from bash, recursing over the colums for each row I'm interested in. But is there a solution in postgres-sql?

That is not really how SQL works. SQL works on rows, not columns.
What this suggests is that your data structure is wrong. If, instead, you stored the values in rows like this:
col1 name value
row1 'col1' value
. . .
Then you would just do:
select name
from t
group by name
having count(*) = sum(case when value then 1 else 0 end);
With your structure, you need to do a separate subquery for each column. Something like this:
select 'col2'
from yourtable
having count(*) = sum(case when col2 then 1 else 0 end)
union all
select 'col3'
from yourtable
having count(*) = sum(case when col3 then 1 else 0 end)
union all
. . .

I'm not trying to answer your question here, but want to tell you what database structure would be appropriate for the task described.
You have a book table with a book id. Each record contains one book.
You have a word table with a word id. Each record contains one word.
Now you want to have a list of all existing book-word combinations.
The table you would create for this relation is called a bridge table. One book can contain many words; one word can be contained in many books; a n:m relation. The table has two columns: the book id and the word id. The two combined are the table's primary key (a composite key). Each record contains one existing combination of book and word.
Here are some examples how to use this table:
To find all words contained in a book:
select word
from words
where word_id in
(
select word_id
from book_word
where book_id =
(
select book_id
from books
where name = 'Peter Pan'
)
);
(That's just an example; the same can be got with joins instead of subqueries.)
To select words that occur in two particular books:
select word
from words
where word_id in
(
select word_id
from book_word
where book_id in
(
select book_id
from books
where name in ('Peter Pan', 'Treasure Island')
)
group by word_id
having count(*) = 2
);
To find words that occur in only one book:
select w.word, min(b.name) as book_name
from words w
join book_word bw on bw.word_id = w.word_id
join books b on b.book_id = bw.book_id
group by w.word_id
having count(*) = 1;

Related

Match and merge when comparing data in 4 tables

I have a requirement to create SQL Server user defined function/SP (either normal or table valued function) which has the below requirements:
The data across 4 tables (Table_A, Table_B, Table_C, Table_D) should be matched based on fix attributes (Name in our below example)
If the data matches in all the 4 tables it gets the highest score & uniqueID is created. For example Match Type = ABCD
If the data matches in other combinations of 3 tables than it gets some score and different UniqueID. For example Match Type = ABC, ABD, BCD, CDA
If the data matches in other combinations of 2 tables than it gets some score and different UniqueID. For example Match Type = AB, AC, AD, BC, BD, CD
Records that doesn't match will get 0 score with separate UniqueID will be stored in the same table.
Table_A
AID | Name | ZipCode
Table_B
BID | Name | ZipCode
Table_C
CID | Name | ZipCode
Table_D
DID | Name | ZipCode
It matches on Name and ZipCode attributes
Final or match and merge table:
UID | AID | BID | CID | DID | Match_Score
Please suggest how can we create a function/stored procedure for the above requirements. If we can make it robust and expandable would be better i.e. If one more tables get added the logic should work with minimal code changes.
Really appreciate your help in this case.
I can think of the below approach but not sure if that can be coded -
ABCD (Output of the table where all the record matches)
UNION ALL
ABC (This will run only on the records that are not par of the ABCD result)
UNION ALL
ACD (This will run only on those records which are not a part of the above 2 results)
UNION ALL
and on and on
Break it down into smaller sections using a temp table for each section and then do your final merge
In your final rank them based on the how many matches.
The typical merge syntax is like so.. remember merge can only have one target table, but multiple sources
MERGE TOP (value) <target_table>
USING <table_source>
ON <merge_search_condition>
[ WHEN MATCHED [ AND <clause_search_condition> ]
THEN <merge_matched> ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND <clause_search_condition> ]
THEN <merge_not_matched> ]
[ WHEN NOT MATCHED BY SOURCE [ AND <clause_search_condition> ]
THEN <merge_matched> ]
[ <output_clause> ]
[ OPTION ( <query_hint> ) ]
;
As a simple easy sample, find names and join them... you can start with this... and refine it to your needs by using something like a DECLARE #MyNumberOfMatchesVariable INT and update it as you get matches
select *
from (
select someUniqueValueToMatchfrom table1
union
select someUniqueValueToMatchfrom table2
union
-- ...
select someUniqueValueToMatchfrom table..
) distinct_usernames
left join table1 on table1.someUniqueValueToMatch= distinct_usernames.Username
left join table2 on table2.someUniqueValueToMatch= distinct_usernames.Username
-- ...
left join table... on table....someUniqueValueToMatch= distinct_usernames.Username
Joining the 4 tables using FULL JOINs will give you all the various combinations:
SELECT AID,BID,CID,DID,
CASE WHEN AID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN BID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN CID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN DID IS NULL THEN 0 ELSE 1 END/*,
CASE WHEN AID IS NULL THEN '' ELSE 'A' END
+ CASE WHEN BID IS NULL THEN '' ELSE 'B' END
+ CASE WHEN CID IS NULL THEN '' ELSE 'C' END
+ CASE WHEN DID IS NULL THEN '' ELSE 'D' END*/
FROM Table_A a
FULL JOIN Table_B b ON a.Name=b.Name AND a.ZipCode=b.ZipCode
FULL JOIN Table_C c ON a.Name=c.Name AND a.ZipCode=c.ZipCode OR b.Name=c.Name AND b.ZipCode=c.ZipCode
FULL JOIN Table_D d ON a.Name=d.Name AND a.ZipCode=d.ZipCode OR b.Name=d.Name AND b.ZipCode=d.ZipCode OR c.Name=d.Name AND c.ZipCode=d.ZipCode

SQL grouping by distinct values in a multi-value string column

(I want to perform a group-by based on the distinct values in a string column that has multiple values
The said column has a list of strings in a standard format separated by commas. The potential values are only a,b,c,d.
For example the column collection (type: String) contains:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
The expected output is a count of unique values:
collection | count
a | 2
b | 3
c | 2
d | 1
For all the below i used this table:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
If the possible values are only a,b,c,d you can try one of this:
Tke note that this will only works if you have not so similar values like test and test_new, because then the test would be joined also with all test_new rows and the count would not match
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
Which will give you the result you want
collection | count
a | 2
b | 3
c | 2
d | 1
or you can try this one
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
The result would be like this
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
What you need to do is to first explode the collection column into separate rows (like a flatMap operation). In redshift the only way to generate new rows is to JOIN - so let's CROSS JOIN your input table with a static table having consecutive numbers, and take only ones having id less or equal to number of elements in the collection. Then we'll use split_part function to read the item at correct index. Once we have the exploaded table, we'll do a simple GROUP BY.
If your items are stored as JSON array strings ('["a", "b", "c"]') then you can use JSON_ARRAY_LENGTH and JSON_EXTRACT_ARRAY_ELEMENT_TEXT instead of REGEXP_COUNT and SPLIT_PART respectively.
with
index as (
select 1 as i
union all select 2
union all select 3
union all select 4 -- could be substituted with 'select row_number() over () as i from arbitrary_table limit 4'
),
agg as (
select 'a,b' as collection
union all select 'b,c'
union all select 'b,c,a'
union all select 'd'
)
select
split_part(collection, ',', i) as item,
count(*)
from index,agg
where regexp_count(agg.collection, ',') + 1 >= index.i -- only get rows where number of items matches
group by 1

SQL group data (find data family)

Please help me, I need to find out a SQL solution for grouping data using SQL Server database.
I'm pretty sure that it could be done in one SQL request but I can't see the trick.
Let' see the problem :
I have a two columns table (please see below an example). I just want to add a new column containing a number or a string which indicates the group
BEFORE :
Col1 | Col2
-----+-----
A | B
B | C
D | E
F | G
G | H
I | I
J | U
AFTER TRANSFORMATION :
Col1 | Col2 | Group
-----+------+------
A | B | 1
B | C | 1
D | E | 2
F | G | 3
G | H | 3
I | I | 4
J | U | 5
In other words: A, B, C are in the same group; D and E too; F, G, H in group 3 ....
Do you have any lookup table to get this group mapping?
Or if you just have a logic defined to decide a group, i would recommend to add a UDF which will return group for supplied values.
SELECT Col1,Col2,GetGroupID(Col1,Col2) AS Group
FROM Table
Your UDF will be something like following
CREATE FUNCTION GetGroupID
(
-- Add the parameters for the function here
#Col1 varchar(10),
#Col2 varchar(10)
)
RETURNS int
AS
BEGIN
DECLARE #groupID int
IF (#Col1="A" AND #Co2 = "B") OR (#Col1="B" AND #Co2 = "C")
BEGIN
SET #groupID = 1
END
IF #Col1="D" AND #Co2 = "E"
BEGIN
SET #groupID = 2
END
-- You can write saveral conditions in the same manner.
return #groupID
END
However, in case you have this mapping defined somewhere in another table, let us know the structure of the table and we can then update the query to join with that table instead of using UDF.
Considering the performance of the query, if the amount of data is huge in your table , it is recommended to have these mappings to one fix table and Join that table in query. Using UDF may harm performance if data amount is huge.
There is absolutely no need for a UDF here. Regardless of whether you are looking to update the table with a new column or simply pull out the data with the grouping applied, you will be best off using a set based solution, ie: create and join to a table.
I am assuming here that you don't have messy data, such as a row with Col1 = 'A' and Col2 = 'F'.
If you are able to add new tables permanently you can use the following to create your lookup table:
create table Col1Groups(Col1 nvarchar(10), GroupNum int);
insert into Col1Groups(Col1,GroupNum) values ('A',1),('B',1),('C',1),('D',2),('E',2),('F',3),('G',3),('H',3);
and then join to it:
select t.Col1
,t.Col2
,g.GroupNum
from Table t
inner join Col1Groups g
on t.Col1 = g.Col1
If you can't, you can just create a derived table via a CTE:
with Col1Groups as
(
select Col1
,GroupNum
from (values('A',1),('B',1),('C',1),('D',2),('E',2),('F',3),('G',3),('H',3)) as x(Col1,GroupNum)
)
select t.Col1
,t.Col2
,g.GroupNum
from Table t
inner join Col1Groups g
on t.Col1 = g.Col1
You get the first rows per group with
select col1, col2 from mytable where col1 not in (select col2 from mytable) or col1 = col2;
We can give these rows numbers with
rank() over (order by col1) as grp
Now we must iterate through the rows to find the ones belonging to those first ones, then those belonging to these, etc. A recursive query.
with cte(col1, col2, grp) as
(
select col1, col2, rank() over (order by col1) as grp
from mytable where col1 not in (select col2 from mytable) or col1 = col2
union all
select mytable.col1, mytable.col2, cte.grp
from cte
join mytable on mytable.col1 = cte.col2
where mytable.col1 <> mytable.col2
)
select * from cte
order by grp, col1;
Additional answer for a more flexible approach
Originally you asked for chains A|B -> B|C, F|G -> G|H etc., but in your comment to my other answer you introduced forks like A|B -> B|C, B|D and I've adjusted my answer.
If you want to go one step further and introduce net-like relations such as A|B -> B|C, D|C, we can no longer follow chains forward only (in the example D belongs to the A group, because though A doesn't lead to D directly, it leads to C and D also leads to C. Here is a way to solve this:
Get all letters from the table (no matter whether in col1 or col2). Then for each of them find related letters (again no matter whether in col1 or col2). And for these again find related letters and so on. That will give you complete groups. But duplicates (as D is in the A group, A is in the D group also), which you can get rid of by simply taking the smallest (or greatest) group key per letter. Then join the Groups to the table.
The query:
with cte(col, grp) as
(
select col, rownum as grp from
(select col1 as col from mytable union select col2 from mytable)
union all
select case when mytable.col1 = cte.col then mytable.col2 else mytable.col1 end, cte.grp
from cte
join mytable on cte.col in (mytable.col1, mytable.col2)
where mytable.col1 <> mytable.col2
)
cycle col set is_cycle to 'y' default 'n'
select mytable.col1, mytable.col2, x.grp
from mytable
join (select col, min(grp) as grp from cte group by col) x on x.col = mytable.col1
order by grp, col;

Determine source on COALESCE fields

I have two tables table which are identical in structure but belong to different schemas (schemas A and B). All rows in question will always appear in the A.table but may or may not appear in B.table. B.table is essentially an override for the defaults in A.table.
As such my query uses a COALESCE on each field similar to:
SELECT COALESCE(B.id, A.id) as id,
COALESCE(B.foo, A.foo) as foo,
COALESCE(B.bar, A.bar) as bar
FROM A.table LEFT JOIN B.table ON (A.id = B.id)
WHERE A.id in (1, 2, 3)
This works great, but I also want to add the source of the data. In the example above, assuming id=2 existed in B.table but not 1 or 3, I would want to include some indication that A is the source for 1 and 3 and B is the source for 2.
So the data might look like the following
+---------------------------------+
| id | foo | bar | source |
+---------------------------------+
| 1 | a | b | A |
| 2 | c | d | B |
| 3 | e | f | A |
+---------------------------------+
I don't really care what the value of source is as long as I can distinguish A from B.
I am no pgsql expert (not by a long shot) but I have tinkered around with EXISTS and a subquery but have had no luck so far.
As records showing the default value (from A.table) have NULLs for B.id, all you need is to add this column specification to your query:
CASE WHEN B.id IS NULL THEN 'A' ELSE 'B' END AS Source
The USING clause would simplify the query you have:
SELECT id
, COALESCE(B.foo, A.foo) AS foo
, COALESCE(B.bar, A.bar) AS bar
, CASE WHEN b.id IS NULL THEN 'A' ELSE 'B' END AS source -- like #Terje provided
FROM a
LEFT JOIN b USING (id)
WHERE a.id IN (1, 2, 3);
But typically, this alternative query should serve you better:
SELECT x.* -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT *, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT *, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
Advantages:
You don't have to add another COALESCE construct for every column you want to add to the result.
The same query works for any number of columns in a and b.
The query even works if the column names are not identical. Only number and data types of columns must match.
Of course, you can always list selected, compatible columns as well:
SELECT * -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT foo, bar, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT foo2, bar17, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
The first SELECT determines names, data types and number of columns.
This query doesn't break if columns in b are not defined NOT NULL.
COALESCE cannot tell the difference between b.foo IS NULL and no row with matching id in b. So the source of any result column (except id) can still be 'A', even if the result row says 'B' - if any relevant column in b can be NULL.
My alternative returns all values from b if the row exists - including NULL values. So the result can be different if columns in b can be NULL. It depends on your requirements which behavior is desirable.
Either query assumes that id is defined as primary key (so exactly 1 or 0 rows per given id value).
Related:
Select first record if none match
What is the difference between LATERAL and a subquery in PostgreSQL?

SQL Server : Nested Select Query

I have a SQL query returning results based on a where clause.
I would like to include some more results, from the same table, dependent on what is found in the first select.
My select returns rows with ID's that meet the where criteria. It does happen that the table has more rows with this ID, but that does not meet the initial where criteria. Rather than re querying the DB with a separate call, I would like to use one select statement to also get these extra rows with the same ID. ID is not the index/ID. Its a naming convention I am using here.
Pseudo: (two steps)
1: select * from table where condition=xxx
2: for each row returned, (select * from table where id=row.id)
I want to do:
select
id as thisID, field1, field2,
(select id, field1, field2 from table where id = thisID)
from
table
where
condition=xxx
I have multiple joins in my real query, and just cant get the above to work. I unfortunately can not supply the real query, but I get an error of:
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS. Invalid column name 'thisID'
My query works fine with the multiple joins, without the above. I am trying to retrieve these extra records as part of the current working query.
Example:
TABLE
select * from table where col3 = 'green'
id, col1, col2, col3
123 | blue | red | green
-------------------------
567 | blue | red | green
-------------------------
123 | blue | red | blue
-------------------------
890 | blue | red | green
-------------------------
I want to return all 4 rows, because although row 3 fails the where condition, it has the same col1 value as row 1 (123), and I need to include it, as it is part of a "set" that I need to locate / import, called / referenced by id=123.
What I am doing manually now, is getting row one, and then running another query based on row 1's ID, to get row 3 as well.
You can use Where IN
select id as thisID, field1, field2 from table
where id in
(select id from table where condition=xxx)
Try this
Let say you table is below and called #Temp
Id Col1 Col2 Col3
123 blue red green
567 blue red green
123 blue red blue
890 blue red green
Will get the id to a temp table
Create Table #T1(Id int)
Insert Into #T1
Select Id
From #Temp
Where Col3='green'
Then
Select distinct *
From #Temp
Where Id in (select Id from #T1) Or Col3='Green'
Which result all the rows from main table
Update
If you want to use the way you currently using, try something like below
select
id as thisID, field1, field2,
(select top 1 id from table where id = t.id) as Id,
(select top 1 field1 from table where id = t.id) as field1,
(select top 1 field2 from table where id = t.id) as field2,
from
table t
where
condition=xxx