BigQuery - Concatenate multiple rows into a single row - google-bigquery

I have a BigQuery table with 2 columns:
id|name
1|John
1|Tom
1|Bob
2|Jack
2|Tim
Expected output: Concatenate names grouped by id
id|Text
1|John,Tom,Bob
2|Jack,Tim

For BigQuery Standard SQL:
#standardSQL
--WITH yourTable AS (
-- SELECT 1 AS id, 'John' AS name UNION ALL
-- SELECT 1, 'Tom' UNION ALL
-- SELECT 1, 'Bob' UNION ALL
-- SELECT 2, 'Jack' UNION ALL
-- SELECT 2, 'Tim'
--)
SELECT
id,
STRING_AGG(name ORDER BY name) AS Text
FROM yourTable
GROUP BY id
Optional ORDER BY name within STRING_CONCAT allows you to get out sorted list of names as below
id Text
1 Bob,John,Tom
2 Jack,Tim
For Legacy SQL
#legacySQL
SELECT
id,
GROUP_CONCAT(name) AS Text
FROM yourTable
GROUP BY id
If you would need to output sorted list here, you can use below (formally - it is not guaranteed by BigQuery Legacy SQL to get sorted list - but for most practical cases I had - it worked)
#legacySQL
SELECT
id,
GROUP_CONCAT(name) AS Text
FROM (
SELECT id, name
FROM yourTable
ORDER BY name
)
GROUP BY id

You can use GROUP_CONCAT
SELECT id, GROUP_CONCAT(name) AS Text FROM <dataset>.<table> GROUP BY id

Related

SQL : How to find the count of an particular category values from an column with string values

I have a SQL Table called "category" looks like this
id | category
--------------
1 | 3,2
2 | 1
3 | 4,3,2
4 | 2,1
5 | 1,4
6 | 2,3,4
There are multiple category id's in the column "category", I need to find the count of an particular category values.
Current method I am using is:
select count(distinct(Category)) AS coldatacount from table_name
It gives the count of all the distinct values WHERE as I need to get
the count of all the particular category_id's separately.
if you are trying to get the Category Ids in comma delimited, you can use the string_split function to get distinct category_id
with cte as (
select 1 as id, '3,2' as category union all
select 2, '1' union all
select 3, '4,3,2' union all
select 4, '2,1' union all
select 5, '1,4' union all
select 6, '2,3,4'
)select count(distinct(value)) as category from cte
cross apply string_split(cte.category, ',');
I assumed that #neeraj04 may be looking for count of all Id in the category, continuing with #METAL code:
CREATE TABLE YourTable
(
Id INT IDENTITY,
[Category] VARCHAR(50)
);
INSERT YourTable VALUES ('3,2'), ('1'), ('4,3,2'), ('2,1'), ('1,4'), ('2,3,4');
SELECT CAST(value AS INT) AS category -- Value is string ouptut
, COUNT([value]) AS IdCount
FROM YourTable yt
CROSS APPLY string_split(yt.Category, ',')
GROUP BY [value]
ORDER BY category;
This is a horrible data model. You should not be storing multiple values in a string. You should not be storing numbers as strings.
Sometimes we are stuck with other people's really, really bad decisions. One approach is to split the string and count:
select t.*, cnt
from t cross apply
(select count(*) as cnt
from string_split(t.category) s
) s;
The other is to count commas:
select t.*,
(1 + len(t.category) - len(replace(t.category, ',', '')) as num_elements
Select Count(Value) from (Select Value from Table_Name a Cross Apply
string_split(a.Category, ','))ab Where Value=1

SQL Server - Sorting column based on string and number

I have a table with one column which contains combination of string and number like as shown below. I need to sort the name column in descending or ascending but the problem is when I use ORDER BY it is not sorting as expected
My query is like as shown below
SELECT * FROM test ORDER BY `name` ASC
My expected result is like as shown blow
employee1
employee2
employee3
employee6
employee6
employee10
employee11
employee12
employee17
employee82
employee100
employee111
employee129
employee299
Can anyone please help me on this
You can try below by only extracting the number part from the field
SELECT * FROM test
ORDER BY cast(replace(`name`,'employee','') as int) ASC
Please try the code below:
-- ASSUMES NUMBERS ARE IN THE LAST CHARACTER
-- WONT WORK IF NUMBER IS THERE IN THE MIDDLE
DROP TABLE IF EXISTS #data
select
Name = REPLACE(name,' ','')
,NameWithNum = REPLACE(name,' ','') + cast(object_id as varchar(100))
INTO #data
from sys.tables
SELECT
NameWithNum
,NameRemovedNumbers= SUBSTRING(NameWithNum, PATINDEX('%[0-9]%', NameWithNum), LEN(NameWithNum))
from #data
ORDER BY Name,SUBSTRING(NameWithNum, PATINDEX('%[0-9]%', NameWithNum), LEN(NameWithNum))
Since the number always in the end, you can do as
WITH CTE AS
(
SELECT 'employee1' Name
union all select 'employee2'
union all select 'employee3'
union all select 'employee6'
union all select 'employee6'
union all select 'employee10'
union all select 'employee11'
union all select 'employee12'
union all select 'employee17'
union all select 'employee82'
union all select 'employee100'
union all select 'employee111'
union all select 'employee129'
union all select 'employee299'
)
SELECT Name
FROM CTE
ORDER BY CAST(SUBSTRING(Name, PATINDEX('%[^a-z, '' '']%', Name), LEN(Name)) AS INT)
--OR PATINDEX('%[0-9]%', Name)
Or
WITH CTE AS
(
SELECT 'The First Employee 1' Name
union all select 'The Second One 2'
union all select 'Employee Number 4'
union all select 'Employee Number 3'
)
SELECT Name
FROM CTE
ORDER BY CAST(SUBSTRING(Name, PATINDEX('%[0-9]%', Name), LEN(Name)) AS INT)
Here is a Live Demo where you can change the strings and see the results.
Finnaly, I would note to the real problem, which is a table with just one column and no PK.
The "base" part of the name is the same. So, you can order by the length and then the name:
SELECT t.*
FROM test t
ORDER BY LEN(name), name;

Select a third column based on two distant rows within the same table

I want to select a third column based on two distant columns within the same table.
I could only think of this:
select tl.thirdcolumn
from table1 t1
WHERE
EXISTS
(
Select distinct tl.firstcolumn , t1.secondcolumn
From t1
)
This:
select distinct tl.thirdcolumn
from table t1
won't work as I don't want the distinct thirdrow. I want the thirdrow to be based on the first two rows being distinct.
I guess its a kind of nested sql statment with a select top 1... idk
CATEGORY NAME Query
---------------------------------------------------
STUDENTS NUMBER_OF_CHAPTERS QueryA
STUDENTS NUMBER_OF_STUDENT_MEMBERS QueryB
STUDENTS NUMBER_OF_STUDENT_MEMBERS QueryB
MEMBERS NUMBER_OF_MEMBERS_WORLDWIDE QueryC
MEMBERS NUMBER_OF_MEMBERS_WORLDWIDE QueryC
Your question is rather hard to follow, but I think you might simply want group by:
select tl.firstcolumn , t1.secondcolumn, max(tl.thirdcolumn)
from table1 t1
group by tl.firstcolumn , t1.secondcolumn;
If you want rows where the pair of values only appears once, then add having count(*) = 1:
select tl.firstcolumn , t1.secondcolumn, max(tl.thirdcolumn)
from table1 t1
group by tl.firstcolumn , t1.secondcolumn
having count(*) = 1;
Query -
SELECT
CATEGORY,NAME,QUERY
FROM
(
WITH TAB AS (
SELECT
'STUDENTS' AS CATEGORY,
'NUMBER_OF_CHAPTERS' AS NAME,
'QUERYA' AS QUERY
FROM
DUAL
UNION ALL
SELECT
'STUDENTS' AS CATEGORY,
'NUMBER_OF_STUDENT_MEMBERS' AS NAME,
'QUERYB' AS QUERY
FROM
DUAL
UNION ALL
SELECT
'STUDENTS' AS CATEGORY,
'NUMBER_OF_STUDENT_MEMBERS' AS NAME,
'QUERYB' AS QUERY
FROM
DUAL
UNION ALL
SELECT
'MEMBERS' AS CATEGORY,
'NUMBER_OF_MEMBERS_WORLDWIDE' AS NAME,
'QUERYC' AS QUERY
FROM
DUAL
UNION ALL
SELECT
'MEMBERS' AS CATEGORY,
'NUMBER_OF_MEMBERS_WORLDWIDE' AS NAME,
'QUERYC' AS QUERY
FROM
DUAL
) SELECT
CATEGORY,
NAME,
QUERY,
COUNT(*) OVER(PARTITION BY
CATEGORY,
NAME
ORDER BY
CATEGORY,
NAME,
QUERY
) AS RNK
FROM
TAB
)
WHERE
RNK = 1;
Output -
"CATEGORY","NAME","QUERY"
"STUDENTS","NUMBER_OF_CHAPTERS","QueryA"

summarize data in group by in sql?

I'm not sure what should I write in the following SQL query to show the following result:
Data:
Color is unique column...
Result:
select color as [name/color], value
from your_table
union all
select name, sum(value)
from your_table
group by name
And if you need a specific order then you can do
select [name/color], value
from
(
select color as [name/color], value, name as order_column
from your_table
union all
select name, sum(value), name
from your_table
group by name
) x
order by order_column

Counting the rows of a column where the value of a different column is 1

I am using a select count distinct to count the number of records in a column. However, I only want to count the records where the value of a different column is 1.
So my table looks a bit like this:
Name------Type
abc---------1
def----------2
ghi----------2
jkl-----------1
mno--------1
and I want the query only to count abc, jkl and mno and thus return '3'.
I wasn't able to do this with the CASE function, because this only seems to work with conditions in the same column.
EDIT: Sorry, I should have added, I want to make a query that counts both types.
So the result should look more like:
1---3
2---2
SELECT COUNT(*)
FROM dbo.[table name]
WHERE [type] = 1;
If you want to return the counts by type:
SELECT [type], COUNT(*)
FROM dbo.[table name]
GROUP BY [type]
ORDER BY [type];
You should avoid using keywords like type as column names - you can avoid a lot of square brackets if you use a more specific, non-reserved word.
I think you'll want (assuming that you wouldn't want to count ('abc',1) twice if it is in your table twice):
select count(distinct name)
from mytable
where type = 1
EDIT: for getting all types
select type, count(distinct name)
from mytable
group by type
order by type
select count(1) from tbl where type = 1
;WITH MyTable (Name, [Type]) AS
(
SELECT 'abc', 1
UNION
SELECT 'def', 2
UNION
SELECT 'ghi', 2
UNION
SELECT 'jkl', 1
UNION
SELECT 'mno', 1
)
SELECT COUNT( DISTINCT Name)
FROM MyTable
WHERE [Type] = 1