it happens that I am having this type of data structure and trying to eliminate the duplicated values under Type in Postgres.
Initial Table
Index Type
1 A, B
2 A, A
3 B, B
Expected Table
Index Type
1 A, B
2 A
3 B
Thanks for the help!
You could use a CTE to split the comma separated values into rows using STRING_TO_ARRAY and UNNEST; then put the distinct values back together again using STRING_AGG:
WITH Types AS (
SELECT DISTINCT Index, UNNEST(STRING_TO_ARRAY(Type, ', ')) AS Type
FROM Data
)
SELECT Index, STRING_AGG(Type, ', ') AS Type
FROM Types
GROUP BY Index
ORDER BY Index
Output:
Index Type
1 A, B
2 A
3 B
Demo on SQLFiddle
Here is an alternative approach that doesn't use aggregation over the entire table:
SELECT Index,
(SELECT STRING_AGG(DISTINCT t, ', ')
FROM UNNEST(STRING_TO_ARRAY(Type, ', ')) AS t
) as types
FROM Data;
Here is a db<>fiddle.
Although I would expect Avoiding the outer aggregation would be a win on performance on larger data sets, it doesn't appear to be so.
Related
I have a table where the column data has a combination of values seperated by ';'. I would like to split them into rows for each column value.
Table data
Now I would like to split them into multiple rows for each value like
I have tried using the below SQL statement.
SELECT DISTINCT COL_NAME FROM "DB"."SCHEMA"."TABLE,
LATERAL FLATTEN(INPUT=>SPLIT(COL_NAME,';'))
But the output is not as expected. Attaching the query output below.
Basically the query does nothing to my data.
It could be achieved using SPLIT_TO_TABLE table function:
This table function splits a string (based on a specified delimiter) and flattens the results into rows.
SELECT *
FROM tab, LATERAL SPLIT_TO_TABLE(column_name, ';')
I was able to resolve this by using LATERAL FLATTERN like a joining table and selecting the value from it.
SELECT DISTINCT A.VALUE AS COL_NAME
FROM "DB"."SCHEMA"."TABLE",
LATERAL SPLIT_TO_TABLE(COL_NAME,';')A
Looks your data has multiple delimiters , We can leverage STRTOK_SPLIT_TO_TABLE function using multiple delimiters..
STRTOK_SPLIT_TO_TABLE
WITH data AS (
SELECT *
FROM VALUES
('Greensboro-High Point-Winston-Salem;Norfolk-Portsmouth-Newport News Washington, D.C. Roanoke-Lynchburg Richmond-Petersburg')
v( cities))
select *
from data, lateral strtok_split_to_table(cities, ';-')
order by seq, index;
Result:
Your first attempt was very close, you just need to access the out of the flatten, instead of the input to the flatten
so using this CTE for data:
WITH fake_data AS (
SELECT *
FROM VALUES
('Greensboro-High Point-Winston-Salem;Norfolk-Portsmouth-Newport News;Washington, D.C.;Roanoke-Lynchburg;Richmond-Petersburg'),
('Knoxville'),
('Knoxville;Memphis;Nashville')
v( COL_NAME)
)
if you had aliased you tables, and accessed the parts.
SELECT DISTINCT f.value::text as col_name
FROM fake_data d,
LATERAL FLATTEN(INPUT=>SPLIT(COL_NAME,';')) f
;
which is what you did in your provided answer, but via SPLIT_TO_TABLE
SELECT DISTINCT f.value as col_name
FROM fake_data d,
TABLE(SPLIT_TO_TABLE(COL_NAME,';')) f
;
STRTOK_SPLIT_TO_TABLE also is the same thing:
SELECT DISTINCT f.value as col_name
FROM fake_data d,
TABLE(strtok_split_to_table(COL_NAME,';')) f
;
Which can also be done via a strtok_to_array and FLATTEN that
SELECT DISTINCT f.value as col_name
FROM fake_data d,
TABLE(FLATTEN(input=>STRTOK_TO_ARRAY(COL_NAME,';'))) f
;
COL_NAME
Greensboro-High Point-Winston-Salem
Norfolk-Portsmouth-Newport News
Washington, D.C.
Roanoke-Lynchburg
Richmond-Petersburg
Knoxville
Memphis
Nashville
I have a SQL table with about 50 columns, the first represents unique users and the other columns represent categories which are scored 1-10.
Here is an idea of what I'm working with
user
a
b
c
abc
5
null
null
xyz
null
6
null
I am interested in counting the number of non-null values per column.
Currently, my queries are:
SELECT col_name, COUNT(col_name) AS count
FROM table
WHERE col_name IS NOT NULL
Is there a way to count non-null values for each column in one query, without having to manually enter each column name?
The desired output would be:
column
count
a
1
b
1
c
0
Consider below approach (no knowledge of column names is required at all - with exception of user)
select column, countif(value != 'null') nulls_count
from your_table t,
unnest(array(
select as struct trim(arr[offset(0)], '"') column, trim(arr[offset(1)], '"') value
from unnest(split(trim(to_json_string(t), '{}'))) kv,
unnest([struct(split(kv, ':') as arr)])
where trim(arr[offset(0)], '"') != 'user'
)) rec
group by column
if applied to sample data in your question - output is
I didn't do this in big-query but instead in SQL Server, however big query has the concept of unpivot as well. Basically you're trying to transpose your columns to rows and then do a simple aggregate of the columns to see how many records have data in each column. My example is below and should work in big query without much or any tweaking.
Here is the table I created:
CREATE TABLE example(
user_name char(3),
a integer,
b integer,
c integer
);
INSERT INTO example(user_name, a, b, c)
VALUES('abc', 5, null, null);
INSERT INTO example(user_name, a, b, c)
VALUES('xyz', null, 6, null);
INSERT INTO example(user_name, a, b, c)
VALUES('tst', 3, 6, 1);
And here is the UNPIVOT I did:
select count(*) as amount, col
from
(select user_name, a, b, c from example) e
unpivot
(blah for col in (a, b, c)
) as unpvt
group by col
Here's example of the output (note, I added an extra record in the table to make sure it was working properly):
Again, the syntax may be slightly different in BigQuery but I think thould get you most of the way there.
Here's a link to my db-fiddle - https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=deaa0e92a4ef1de7d4801e458652816b
How could you convert or transpose a range of data into a single column as shown above? Values could be ambiguous in data but output should contain unique values only.
(Updated after more information was provided in comments)
If your initial data comes from a query you could use a common table expression to do this:
with query_results (a,b,c) as (
... your original query that you have not shown us goes here ...
)
select a
from query_results
union
select b
from query_results
union
select c
from query_results
order by 1
The UNION operator will remove duplicates from the output
You can use UNPIVOT:
SELECT value
FROM your_table
UNPIVOT ( value FOR type IN ( a, b, c ) );
I have two row values from table C:
Select Name FROM Table C Where AccountID = 123
COL1
Row 1 |Ricky|
Row 2 |Roxy |
I want to be able to select both of these two values in a SubQuery that will be used in a larger query. So that it displays "Ricky, Roxy"
How can this be done without declaring a variable?
SELECT COL1 = STUFF ((SELECT ',' + COL1 FROM tableC WHERE AccountID=123
FOR XML PATH(''), Type).value('.[1]','nvarchar(max)'),
1,1,'')
This will return all account 123 COL1 values as one column, with commas separating values.
Here is a SQL Fiddle
I have a table (see the image below --red box). It describes the content of my table (A, B, C, and D) are the columns. The data structure will always be like this, if col A is Type_1, only col B has a content while if Col A is Type_2, Col C and D has contents while col B is NULL.
Now, the table which re enclosed with green box is my desired output.
My experience on building a select statement is not very extensive and I'm almost leaning towards creating two separate tables to get my desired result (like 1 table for Type_1 data only and another table for Type_2 data only).
Question is, is it possible to query two rows and combine it to become a single output result using SELECT query? Considering that these two rows are on the same table?
Thanks.
Something like this:
SELECT
Table2Id,
MAX(B) B,
MAX(C) C,
MAX(D) D
FROM tbl
WHERE A != 'Type_3'
GROUP BY Table2Id
Assuming that there is only one row of data for type1 and one row of data for type 2, you can use the following:
SELECT Id, MAX(B) AS B, MAX(C) AS C, MAX(D) AS D
FROM Table2
WHERE A IN ('Type_1','Type_2')
GROUP BY Id
Example in this SQL Fiddle
You can make subqueries by enclosing them in parenthesis. As in:
SELECT (SELECT TOP 1 B FROM table ORDER BY some_ordering), (SELECT TOP 1 C FROM table WHERE NOT C IS NULL), D FROM table
The queries inside the parenthesis can apply to any table, and can use the data from the main query in calculations of the selected values and in filters.