How to combine multiple rows as one in Bigquery? - sql

I have a BigQuery table which has data as shown below image
I wish to create a table out of this data which is as shown below image
So here I wish to
remove the email column data
combine the emp_type column values as comma separated value
have just 1 row per id
I tried using STRING_AGG function of BigQuery but was unable to achieve what I specified above.
The table actually has more than 30 columns but for the sake of explaining the issue i reduced it to 7 columns.
How do I combine multiple rows as one in a query?

Consider below approach
select
any_value((select as struct * except(email, emp_type) from unnest([t]))).*,
string_agg(emp_type, ', ') emp_type
from data t
group by to_json_string((select as struct * except(email, emp_type) from unnest([t])))
if applied to sample data in your question - output is
As you can see here - it will work no matter how many columns you have 30+ or 100+ . you done even need to type them at all!

I see two possible options, if you want to have uniqe row per combination of all parameters except email and emp_type:
SELECT id, name, status, `count`, is_hybrid, STRING_AGG(emp_type, ', ')
FROM data
GROUP BY id, name, status, `count`, is_hybrid
If you want to have just one row per id, you can group by id and select arbitrary value(from rows with this id) for other columns:
SELECT id, ANY_VALUE(name), ANY_VALUE(status), ANY_VALUE(`count`), ANY_VALUE(is_hybrid), STRING_AGG(emp_type, ', ')
FROM data
GROUP BY id

Related

duplicates to be removed sql

I have in database records
My sql:
SELECT
DISTINCT name, date(mod_wr)
FROM
test.object_stg
WHERE
ir = '4552724'
GROUP BY
name, date(mod_wr)
ORDER BY name
The last record is the same as the last but one. It has only a different date.
Is it possible to somehow query to return all records where there has been a change in the "name" column?
For record 4 and 5 there is the same name, only a different date. I would like it to return only a record of 4 and 5, because there was no change.
If you don't want to remove rows where values are resused. E.g. your line #2, you can use LAG() and then only include rows where the value is different to the previous. E.g.
select name, date(mod_wr) from
(
SELECT
name, mod_wr, lag(name) over(order by mod_wr) as prev_name
FROM
test.object_stg
WHERE
ir = '4552724'
)
WHERE prev_name IS NULL OR name <> prev_name
From your sample data, you have 3 distinct names. However, you cannot use distinct in your select statement because it applies to every field listed and none of the dates would provide an exact match.
However, you can use a group by statement in order to collate your titles together.
// MySQL 5.6 Statement
select name, date(mod_wr) from object_stg group by name;
// MSSQL 2017 Statement
select name, max(mod_wr) from object_stg group by name;
Both statements return 3 lines with just the BMW, 1.0 GL and 1.0 GLS showing with a single date.
SQL Fiddle

Converting multiple rows into single row with multiple columns

I have a table that has multiple rows for a distinct CARD_ID listing different roles assigned to that CARD_ID. I'd like to have a query that creates a single row for each distinct CARD_ID that has multiple columns listing the different roles. See image for example of current table. Duplicates are highlighted.
So, I'd like one row for CARD_IDs 1-10, with columns in each row for Cardholder, Reconciler, and Approver.
If a particular CARD_ID doesn't have one of those roles, I'm ok with that field being null or having some other type of indicator.
One method i conditional aggregation:
select card_id,
max(iif(role = 'Reconciler', col, NULL)) as reconciler_col,
max(iif(role = 'Approver', col, NULL)) as approver_col,
max(iif(role = 'Cardholder', col, NULL)) as cardholder_col
from t
group by card_id;
col is a column that you want to pivot. You can add more than one column, just by adding more max(iif . . .) to the select.

find unique rows using SQL?

I want to return all the rows from a table which are unique. I.e. if a certain field in two rows contain the same name, that name shouldn't be shown.
Since you want only the uniques names (and not an unique row for every names like you could have with DISTINCT), you have to use a GROUP BY and a HAVING (instead of a WHERE, because your parameter is the result of a function, not a variable) :
SELECT name FROM myTable GROUP BY name HAVING COUNT(name) = 1
SELECT DISTINCT column_name FROM table
If you want the complete rows, then use row_number() or distinct on:
select distinct on (name) t.*
from table t
order by name;

How to set/serialize values based on results from multiple rows / multiple columns in postgresql

I have a table in which I want to calculate two columns values based on results from multiple rows / multiple columns. The primary key is set on the first two columns (tag,qid).
I would like to set the values of two fields (serial and total).
The "serial" column value is unique for each (tag,qid) so if I have 2 records with same tag, I must have record one with serial# 1 and record two with serial# 2 and so on. The serial must be calculated with accordance to priority field in which higher priority values must start serializing first.
the "total" column is the total number of each tag in the table
I would like to do this in plain SQL instead of creating a stored procedure/cursors, etc...
the table below shows full valid settings.
                                 
 +----+----+--------+-------+-----+  
 |tag |qid |priority|serial |total|  
 +--------------------------------+  
 |abc | 87 |  99    |  1    |  2  |  
 +--------------------------------+  
 |abc | 56 |  11    |  2    |  2  |  
 +--------------------------------+  
 |xyz | 89 |  80    |  1    |  1  |  
 +--------------------------------+  
 |pfm | 28 |  99    |  1    |  3  |  
 +--------------------------------+  
 |pfm | 17 |  89    |  2    |  3  |  
 +--------------------------------+  
 |pfm | 64 |  79    |  3    |  3  |  
 +----+----+--------+-------+-----+  
  
Many Thanks
You can readily return a result set with this information using window functions:
select tag, qid, priority,
row_number() over (partition by tag, qid order by priority desc) as serial,
count(*) over (partition by tag, qid) as total
from table t;

SQL Server Sum multiple rows into one - no temp table

I would like to see a most concise way to do what is outlined in this SO question: Sum values from multiple rows into one row
that is, combine multiple rows while summing a column.
But how to then delete the duplicates. In other words I have data like this:
Person Value
--------------
1 10
1 20
2 15
And I want to sum the values for any duplicates (on the Person col) into a single row and get rid of the other duplicates on the Person value. So my output would be:
Person Value
-------------
1 30
2 15
And I would like to do this without using a temp table. I think that I'll need to use OVER PARTITION BY but just not sure. Just trying to challenge myself in not doing it the temp table way. Working with SQL Server 2008 R2
Simply put, give me a concise stmt getting from my input to my output in the same table. So if my table name is People if I do a select * from People on it before the operation that I am asking in this question I get the first set above and then when I do a select * from People after the operation, I get the second set of data above.
Not sure why not using Temp table but here's one way to avoid it (tho imho this is an overkill):
UPDATE MyTable SET VALUE = (SELECT SUM(Value) FROM MyTable MT WHERE MT.Person = MyTable.Person);
WITH DUP_TABLE AS
(SELECT ROW_NUMBER()
OVER (PARTITION BY Person ORDER BY Person) As ROW_NO
FROM MyTable)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
First query updates every duplicate person to the summary value. Second query removes duplicate persons.
Demo: http://sqlfiddle.com/#!3/db7aa/11
All you're asking for is a simple SUM() aggregate function and a GROUP BY
SELECT Person, SUM(Value)
FROM myTable
GROUP BY Person
The SUM() by itself would sum up the values in a column, but when you add a secondary column and GROUP BY it, SQL will show distinct values from the secondary column and perform the aggregate function by those distinct categories.