Merging records based on condition in bigquery

Merging records based on condition in bigquery - sql

I have multiple rows for members and want to merge them based on the values of two columns by giving priority to the value 'Yes'.
Name | Status1 | Status2
Jon | Yes | No
Jon | No | Yes
I want the query to return
Name | Status1 | Status2
Jon | Yes | Yes
So, if the column has Yes even once, it has to assign Yes for the person and No otherwise.

Below is for BigQuery Standard SQL
#standardSQL
SELECT Name, MAX(Status1) AS Status1, MAX(Status2) AS Status2
FROM `project.dataset.table`
GROUP BY Name
You can test, play with it using sample data
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Jon' Name, 'Yes' Status1, 'No' Status2 UNION ALL
SELECT 'Jon', 'No', 'Yes'
)
SELECT Name, MAX(Status1) AS Status1, MAX(Status2) AS Status2
FROM `project.dataset.table`
GROUP BY Name
with result
Row Name Status1 Status2
1 Jon Yes Yes

In addition to Mikhail's answer, I am adding another solution with MsSQL. Syntax may be different but the logic would be similar:
create table test
(id int , name1 varchar(10), name2 varchar(10))
insert into test values (1,'yes','no')
insert into test values (2,'no','no')
insert into test values (3,'yes','yes')
declare #searchKey varchar(10) = 'yes'
declare #cols varchar(255) = (SELECT STUFF((
SELECT ', ' + c.name
FROM sys.columns c
JOIN sys.types AS t ON c.user_type_id=t.user_type_id
WHERE t.name != 'int' AND t.name != 'bit' AND t.name !='date' AND t.name !='datetime'
AND object_id =(SELECT object_id FROM sys.tables WHERE name='test')
FOR XML PATH('')),1,2,''))
declare #sql nvarchar(max) = 'SELECT * from test where '''+#searchKey+''' in ('+#cols+')'
exec sp_executesql #sql
Edit: Please note that this solution checks all the columns of a table if a specific value is included by any column. Assume the OP needs to check 100 columns, until status100, then I believe a dynamic solution like that would be more handy.

Related

Find the non null columns in SQL Server in a table

I have read many answers but they are all for PL/SQL or Oracle, I could not find anything for Microsoft SQL-Server.
My table :
CREATE TABLE StudentScore
(
Student_ID INT PRIMARY KEY,
Student_Name NVARCHAR (50),
Student_Score INT
)
GO
INSERT INTO StudentScore VALUES (1,'Ali', NULL)
INSERT INTO StudentScore VALUES (2,'Zaid', 770)
INSERT INTO StudentScore VALUES (3,'Mohd', 1140)
INSERT INTO StudentScore VALUES (4,NULL, 770)
INSERT INTO StudentScore VALUES (5,'John', 1240)
INSERT INTO StudentScore VALUES (6,'Mike', 1140)
INSERT INTO StudentScore VALUES (7,'Goerge', NULL)
How to find the names of all the non-null columns.
Return table containing only non null columns
EDIT based on comments:
I am aware of IS_NULLABLE attribute of Information_schema . But just because a column allows null values does not mean it will actually have null values. How to find out columns which actually have null values.
I am looking for some num_nulls equivalent for microsoft SQL-SERVER.

You could achieve it by issuing:
SELECT
FORMATMESSAGE('SELECT col = ''%s.%s.%s'' FROM %s.%s HAVING COUNT(*) != COUNT(%s)',
QUOTENAME(TABLE_SCHEMA),
QUOTENAME(TABLE_NAME),
QUOTENAME(COLUMN_NAME),
QUOTENAME(TABLE_SCHEMA),
QUOTENAME(TABLE_NAME),
QUOTENAME(COLUMN_NAME)
)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES';
db<>fiddle demo
It will generate script for checking individual column.
HAVING COUNT(*) != COUNT(col_name) -- it means that column contains at least single NULL
HAVING COUNT(col_name) = 0 AND COUNT(*) != 0 -- it means all values in columns are NULL
This approach could be polished with using STRING_AGG to get single query per table and with dynamic SQL you could avoid the need of copying the query.
EDIT:
Fully baked-solution:
DECLARE #sql NVARCHAR(MAX);
SELECT #sql = STRING_AGG(
FORMATMESSAGE('SELECT table_schema = ''%s''
,table_name = ''%s''
,table_col_name = ''%s''
,row_num = COUNT(*)
,row_num_non_nulls = COUNT(%s)
,row_num_nulls = COUNT(*) - COUNT(%s)
FROM %s.%s',
QUOTENAME(TABLE_SCHEMA),
QUOTENAME(TABLE_NAME),
QUOTENAME(COLUMN_NAME),
QUOTENAME(COLUMN_NAME),
QUOTENAME(COLUMN_NAME),
QUOTENAME(TABLE_SCHEMA),
QUOTENAME(TABLE_NAME),
QUOTENAME(COLUMN_NAME)), ' UNION ALL' + CHAR(13)
) WITHIN GROUP(ORDER BY TABLE_SCHEMA, TABLE_NAME)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES'
AND TABLE_NAME = ? -- filter by table name
AND TABLE_SCHEMA = ?; -- filter by schema name
SELECT #sql;
EXEC(#sql);
db<>fiddle demo
Output:
+---------------+-----------------+------------------+----------+--------------------+---------------+
| table_schema | table_name | table_col_name | row_num | row_num_non_nulls | row_num_nulls |
+---------------+-----------------+------------------+----------+--------------------+---------------+
| [dbo] | [StudentScore] | [Student_Name] | 7 | 6 | 1 |
| [dbo] | [StudentScore] | [Student_Score] | 7 | 5 | 2 |
+---------------+-----------------+------------------+----------+--------------------+---------------+

Perhaps you want to look at INFORMATION_SCHEMA.COLUMNS. The column IS_NULLABLE provides this information.
Note that the INFORMATION_SCHEMA tables (well, they are really views) are Standard SQL, so this information is available in most database. Oracle has not (yet?) adopted them.

Using column value as column name in subquery

I'm working with a legacy DB that has a table that houses field names from other tables.
So I have this structure:
Field_ID | Field_Name
*********************
1 | Col1
2 | Col2
3 | Col3
4 | Col4
and I need to pull a list of this field metadata along with the values of that field for a given user. So I need:
Field_ID | Field_Name | Value
1 | Col1 | ValueOfCol1onADiffTable
2 | Col2 | ValueOfCol2onADiffTable
3 | Col3 | ValueOfCol3onADiffTable
4 | Col4 | ValueOfCol4onADiffTable
I'd like to use the Field_Name in a subquery to pull that value, but can't figure out how to get SQL to evaluate Field_Name as a column in the sub-query.
So something like this:
select
Field_ID
,Field_Name
,(SELECT f.Field_Name from tblUsers u
where u.User_ID = #userId) as value
from
dbo.tblFields f
But that just returns Field_Name in the values column, not the value of it.
Do I need to put the sub-query in a separate function and evaluate that? Or some kind of dynamic SQL?

In SQL server this would require dynamic SQL and UNPIVOT notation.
see working demo
create table tblFields (Field_ID int ,Field_Name varchar(10));
insert into tblFields values
(1,'Col1')
,(2,'Col2')
,(3,'Col3')
,(4,'Col4');
declare #userId int
set #userId=1
create table tblUsers (User_ID int, col1 varchar(10),col2 varchar(10));
insert into tblUsers values
(1, 10,100),
(2,20,200);
declare #collist varchar(max)
declare #sqlquery varchar(max)
select #collist= COALESCE(#collist + ', ', '') + Field_Name
from dbo.tblFields
where exists (
select * from sys.columns c join sys.tables t
on c.object_id=t.object_id and t.name='tblUsers'
and c.name =Field_Name)
select #sqlquery=
' select Field_ID ,Field_Name, value '+
' from dbo.tblFields f Join '+
' ( select * from '+
'( select * '+
' from tblUsers u '+
' where u.User_ID = '+ cast(#userId as varchar(max)) +
' ) src '+
'unpivot ( Value for field in ('+ #collist+')) up )t'+
' on t.field =Field_Name'
exec(#sqlquery)

Convert row to column when data are not numbers

I have a Question table, which has a unknown number of questions.(first table in the figure)
I also have an AnswerSheet table, which records student's answer to question.(second table in the figure)
Create table Question
(
Id int,
Text nvarchar(50),
PRIMARY KEY (Id)
)
Create table AnswerSheet
(
StudentId int,
QuestionId int,
Answer nvarchar(50),
PRIMARY KEY (StudentId,QuestionId),
FOREIGN KEY (QuestionId) REFERENCES Question (Id)
)
insert into Question
values(1,'What''s your age'),
(2,'What''s your gender'),
(3,'When do you go home'),
....
insert into AnswerSheet
values(500,1,'20'),
(500,2,'Male'),
(500,3,'5:00pm'),
(501,1,'50'),
(502,2,'I don''t know##'),
....
How do I write a SQL to generate a table like this?
StudentId What's your age What's your gender When do you go home ...
--------- ---------------- ------------------- -------------------
500 20 Male 5:00pm ...
501 50 NULL NULL
502 NULL I don''t know## NULL ...
I feel Pivot is promising but I'm not sure how to use it especially PIVOT requires an aggreation function but my data are not numbers.

Assuming you wanted to go Dynamic
Example
Declare #SQL varchar(max) = Stuff((Select ',' + QuoteName(Text) From Question Order by ID For XML Path('')),1,1,'')
Select #SQL = '
Select *
From (
Select StudentID
,Col = B.Text
,Value = A.Answer
From AnswerSheet A
Join Question B on A.QuestionID=B.ID
) A
Pivot (max(Value) For [Col] in (' + #SQL + ') ) p'
Exec(#SQL);
Returns
StudentID What's your age What's your gender When do you go home
500 20 Male 5:00pm
501 50 NULL NULL
502 NULL I don't know## NULL
If it Helps, the Generated SQL Looks Like This
Select *
From (
Select StudentID
,Col = B.Text
,Value = A.Answer
From AnswerSheet A
Join Question B on A.QuestionID=B.ID
) A
Pivot (max(Value) For [Col] in ([What's your age],[What's your gender],[When do you go home]) ) p

I know this question is answered by accepted one, but I hope this approach helps others.
simply you can achieve your goal without using Pivot, via using Group by as next:-
Select b.StudentId,
Min(Case a.text When 'What''s your age' Then b.answer End) 'What''s your age',
Min(Case a.text When 'What''s your gender' Then b.answer End) 'What''s your gender',
Min(Case a.text When 'When do you go home' Then b.answer End) 'When do you go home'
from Question a inner join AnswerSheet b
on a.id = b.Questionid
Group By StudentId
and you mentioned unknown number of questions, so the next code for dynamic:-
DECLARE #DynamicQuestions VARCHAR(8000)
SELECT #DynamicQuestions = Stuff(
(SELECT N' Min(Case a.text When''' + replace (Text,'''','''''')
+ ''' Then b.answer End) '''
+ replace (Text,'''','''''') + ''','
FROM Question FOR XML PATH(''),TYPE)
.value('text()[1]','nvarchar(max)'),1,1,N'')
select #DynamicQuestions =
left(#DynamicQuestions,len(#DynamicQuestions)-1) -- for Removing last comma
exec ('Select b.StudentId, '+ #DynamicQuestions +
'from Question a inner join AnswerSheet b
on a.id = b.Questionid
Group By StudentId' )
Result:-
StudentId What's your age What's your gender When do you go home
500 20 Male 5:00pm
501 50 NULL NULL
502 NULL I don't know## NULL

T-Sql Query with dynamic (unknown) number of columns

We have a project where we should provide the possible to the user to add own custom columns to various tables.
Edit: these are 2 tables, not one.
**Products**
ProductId
Name
Price
Date
UserId
**ProductsCustomColumns**
ProductId
ColumnName
ColumnValue
EDIT: Please note that the dynamic columns are recorded as values and we don't know the count of these...it can be 0 or 200 or any.
Here is an example:
Now when we query the products tables we want to show all the predefined columns and after them all custom columns.
Obviously each user can have own number of columns with values and names.
SELECT *, (and the custom columns) FROM Products WHERE UserId = 3 AND ProductId = 1
Here are 2 questions:
Would that be good solution from performance point of view or there is better approach for solving the dynamic columns requirement?
How can I create a query that could read all records from ProductsCustomColumns for given userId and productId and append the records as columns to the query?
Thanks.

You need to write dynamic Query
DECLARE #COLUMNS VARCHAR(MAX)='', #QRY VARCHAR(MAX);
SELECT #COLUMNS = #COLUMNS +COLUMN_NAME +',' FROM
INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='Products'
SELECT #COLUMNS =SUBSTRING (#COLUMNS,1 ,LEN(#COLUMNS)-1)
SELECT #QRY ='SELECT '+#COLUMNS + ' FROM Products WHERE UserId = 3 AND ProductId = 1'
EXEC (#QRY)
EDIT: From your Comments & Edited Question
Schema I assumed from your Question
CREATE TABLE Products (
ProductId INT,
Name VARCHAR(250),
Price DECIMAL(18,2),
DateS DATETIME,
UserId INT)
INSERT INTO Products
SELECT 1,'Oil Product', 2000, GETDATE(), 3
UNION ALL
SELECT 2,'Amway', 600, GETDATE(), 2
UNION ALL
SELECT 3,'Thermal', 5000, GETDATE(), 1
UNION ALL
SELECT 4,'Oil Product', 500, GETDATE(), 4
CREATE TABLE ProductsCustomColumns
(
ProductId INT ,
ColumnName VARCHAR(200),
ColumnValue VARCHAR(15))
INSERT INTO ProductsCustomColumns
SELECT 1, 'Licence_No', '1545'
UNION ALL
SELECT 1, 'Location ', 'Atlanta'
UNION ALL
SELECT 2, 'Qty ', '5'
UNION ALL
SELECT 3, 'Gross', '80000'
Now your Dynamic Code goes here
DECLARE #COLUMN_PCC VARCHAR(MAX)='', #PRODUCT_ID INT=1,#USER_ID INT=3, #QRY VARCHAR(MAX) ;
--preparing Custom Column Name List with comma ','
SELECT #COLUMN_PCC = #COLUMN_PCC+ [COLUMNNAME] +',' FROM ProductsCustomColumns
WHERE ProductId= #PRODUCT_ID
SELECT #COLUMN_PCC =SUBSTRING(#COLUMN_PCC,1,LEN(#COLUMN_PCC)-1)
--Preparing Dynamic Query
SELECT #QRY =' SELECT P.*, AV.* FROM Products P
INNER JOIN
(
SELECT * FROM (
SELECT * FROM ProductsCustomColumns WHERE ProductId= '+CAST(#PRODUCT_ID AS VARCHAR(50))+'
)
AS A
PIVOT
(
MAX (COLUMNVALUE)
FOR [COLUMNNAME] IN ('+#COLUMN_PCC +')
)AS PVT
)AS AV ON P.ProductId= AV.ProductId
AND P.UserId='++CAST(#USER_ID AS VARCHAR(50))+'
'
EXEC ( #QRY)
And the Result will be
+-----------+-------------+---------+-------------------------+--------+-----------+------------+----------+
| ProductId | Name | Price | DateS | UserId | ProductId | Licence_No | Location |
+-----------+-------------+---------+-------------------------+--------+-----------+------------+----------+
| 1 | Oil Product | 2000.00 | 2016-12-09 18:06:24.090 | 3 | 1 | 1545 | Atlanta |
+-----------+-------------+---------+-------------------------+--------+-----------+------------+----------+

You need dynamic sql no other way to do this
DECLARE #sql VARCHAR(max),
#cust_col VARCHAR(max)
SET #cust_col = (SELECT Quotename(CustomColumns) + ','
FROM ProductsCustomColumns
FOR xml path(''))
SELECT #cust_col = LEFT(#cust_col, Len(#cust_col) - 1)
SET #sql = 'SELECT *, ' + #cust_col + ' FROM Products WHERE UserId = 3 AND ProductId = 1'
EXEC (#sql)

In general it is a very bad idea to add custom data in additional columns of your main table. Just imagine 100 customers using this. All of them have differing table schemas and you wnat to write an update script for all of them?
It is a pain in the neck, if you have to deal with result sets where you don't know the structure in advance!
You have several choices:
Add one column of type XML. The advantage: The resultset is fix. You just need a customer specific rule, how to interpret the XML. You can solve this with an inline table valued function. Pass in the XML and get a derived table back. Call this with CROSS APPLY and you are out...
Add a new table with the customerID and Key-Value-Pairs
If the additional data is not completely different, add some of the columns to your main table as SPARSE columns

Pivoting rows into columns in SQL Server

I have a set of data that looks like this:
Before
FirstName LastName Field1 Field2 Field3 ... Field27
--------- -------- ------ ------ ------ -------
Mark Smith A B C D
John Baptist X T Y G
Tom Dumm R B B U
However, I'd like the data to look like this:
After
FirstName LastName Field Value
--------- -------- ----- -----
Mark Smith 1 A
Mark Smith 2 B
Mark Smith 3 C
Mark Smith 4 D
John Baptist 1 X
John Baptist 2 T
John Baptist 3 Y
John Baptist 4 G
Tom Dumm 1 R
Tom Dumm 2 B
Tom Dumm 3 B
Tom Dumm 4 U
I have looked at the PIVOT function. It may work. I am not too sure. I couldn't make sense of how to use it. But, I am not sure that the pivot could place a '4' in the 'Field' column. From my understanding, the PIVOT function would simply transpose the values of Field1...Field27 into the 'Value' column.
I have also considered iterating over the table with a Cursor and then looping over the field columns, and then INSERTing into another table the 'Field's and 'Value's. However, I know this will impact performance since it's a serial-based operation.
Any help would be greatly appreciated! As you can tell, I'm quite new to T-SQL (or SQL in general) and SQL Server.

You can perform with an UNPIVOT. There are two ways to do this:
1) In a Static Unpivot you would hard-code your Field columns in your query.
select firstname
, lastname
, replace(field, 'field', '') as field
, value
from test
unpivot
(
value
for field in (field1, field2, field3, field27)
) u
See a SQL Fiddle for a working demo.
2) Or you could use a Dynamic Unpivot which will get the list of items to PIVOT when you run the SQL. The Dynamic is great if you have a large amount of fields that you will be unpivoting.
create table mytest
(
firstname varchar(5),
lastname varchar(10),
field1 varchar(1),
field2 varchar(1),
field3 varchar(1),
field27 varchar(1)
)
insert into mytest values('Mark', 'Smith', 'A', 'B', 'C', 'D')
insert into mytest values('John', 'Baptist', 'X', 'T', 'Y', 'G')
insert into mytest values('Tom', 'Dumm', 'R', 'B', 'B', 'U')
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX);
select #cols = stuff((select ','+quotename(C.name)
from sys.columns as C
where C.object_id = object_id('mytest') and
C.name like 'Field%'
for xml path('')), 1, 1, '')
set #query = 'SELECT firstname, lastname, replace(field, ''field'', '''') as field, value
from mytest
unpivot
(
value
for field in (' + #cols + ')
) p '
execute(#query)
drop table mytest
Both will produce the same results.

If you want to do it query than quick and dirty way will be to create Union
Select FirstName,LastName,1,Field1
from table
UNION ALL
Select FirstName,LastName,2,Field2
from table
.
.
And similar for all field cols

Rather than using pivot, use unpivot like this:
select firstname, lastname, substring(field,6,2) as field, value
from <yourtablename>
unpivot(value for field in (field1,field2,field3,field4,field5,field6,field7,field8,field9,field10,field11,field12,field13,field14,field15,field16,field17,field18,field19,field20,field21,field22,field23,field24,field25,field26,field27,field1,field2,field3,field4,field5,field6,field7,field8,field9,field10,field11,field12,field13,field14,field15,field16,field17,field18,field19,field20,field21,field22,field23,field24,field25,field26,field27)) as unpvt;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Merging records based on condition in bigquery - sql

Related

Find the non null columns in SQL Server in a table

Using column value as column name in subquery

Convert row to column when data are not numbers

T-Sql Query with dynamic (unknown) number of columns

Pivoting rows into columns in SQL Server

Categories

Resources