sql how to convert multi select field to rows with totals - sql

I have a table that has a field where the contents are a concatenated list of selections from a multi-select form. I would like to convert the data in this field into in another table where each row has the text of the selection and a count the number of times this selection was made.
eg.
Original table:
id selections
1 A;B
2 B;D
3 A;B;D
4 C
I would like to get the following out:
selection count
A 2
B 3
C 1
D 2
I could easily do this with split and maps in javascript etc, but not sure how to approach it in SQL. (I use Postgresql) The goal is to use the second table to plot a graph in Google Data Studio.

A much simpler solution:
select regexp_split_to_table(selections, ';'), count(*)
from test_table
group by 1
order by 1;

You can use a lateral join and handy set-returning function regexp_split_to_table() to unnest the strings to rows, then aggregate and count:
select x.selection, count(*) cnt
from mytable t
cross join lateral regexp_split_to_table(t.selections, ';') x(selection)
group by x.selection

Related

I am having Issues counting values in a row with separators using SQL

I am new to snowflake and trying the count the number of values in a row with separators using SQL. I am not sure how to go about it. I've googled solutions for this but have not been able to find one.
table name: Lee_tab
user
names
id01
Jon;karl;lee;
id02
Abi;jackson;
id03
don;
id04
what I want to achieve
user
names
name_count
id01
Jon;karl;lee;
3
id02
Abi;jackson;
2
id03
don;
1
id04
0
Here is three solutions using REGEXP_COUNT, SPLIT, ARRAY_SIZE, STRTOK_TO_ARRAY (I would use the REGEXP_COUNT one):
SELECT
column1,
column2,
regexp_count(column2, ';')+1 as solution_1,
ARRAY_SIZE(split(column2, ';')) as solution_2,
ARRAY_SIZE(strtok_to_array(column2, ';')) as solution_3
FROM VALUES
('id01','Jon;karl;lee'),
('id02','Abi;jackson'),
('id03','don');
which gives
COLUMN1
COLUMN2
SOLUTION_1
SOLUTION_2
SOLUTION_3
id01
Jon;karl;lee
3
3
3
id02
Abi;jackson
2
2
2
id03
don
1
1
1
It depends on which DataBase you're using, because there are some different
things in syntax. I made your example with using SQLite Browser and I have a result like this one:
SELECT SUM(length(names) - length(replace(names, ';', '')) +1)
AS TotalCount
FROM Lee_tab where id = USER ID
As I know, in Postgres there's no length, it's just len there, so, pay an attention.
My query-it's just a formula to how count values, separated by ;
To get your result, you should learn how to join.
Here is a different answer, using the Snowflake SPLIT_TO_TABLE function. This function splits the string on the delimiter, creating a row for each value, which we lateral join back to the CTE table, finally we COUNT and GROUP BY using standard SQL syntax:
with cte as (
select 'id01' as user, 'Jon;karl;lee' as names union all
select 'id02' as user, 'Abi;jackson' as names union all
select 'id03' as user, 'don' as names
)
select user, names, count(value) as count_names
from cte, lateral split_to_table(cte.names, ';')
group by user, names;
Rewriting json_stattham's answer using Snowflake syntax. Basically, we are just counting the number of separators (semicolons) in the string and adding 1. There is no need to use the SUM() function as in json_stattham's answer.
with cte as (
select 'id01' as user, 'Jon;karl;lee' as names union all
select 'id02' as user, 'Abi;jackson' as names union all
select 'id03' as user, 'don' as names
)
SELECT user, names, (length(names) - length(replace(names, ';'))) + 1 AS name_count
FROM cte;
This is the answer for your query
select user,names,(len(names) - len(replace(names, ';',''))+1) names_count from Lee_tab;
for more understanding check this ,i have done all
https://www.db-fiddle.com/f/BQuEjw2pthMDb1z8NTdHv/0

Postgres union of queries in loop

I have a table with two columns. Let's call them
array_column and text_column
I'm trying to write a query to find out, for K ranging from 1 to 10, in how many rows does the value in text_column appear in the first K elements of array_column
I'm expecting results like:
k | count
________________
1 | 70
2 | 85
3 | 90
...
I did manage to get these results by simply repeating the query 10 times and uniting the results, which looks like this:
SELECT 1 AS k, count(*) FROM table WHERE array_column[1:1] #> ARRAY[text_column]
UNION ALL
SELECT 2 AS k, count(*) FROM table WHERE array_column[1:2] #> ARRAY[text_column]
UNION ALL
SELECT 3 AS k, count(*) FROM table WHERE array_column[1:3] #> ARRAY[text_column]
...
But that doesn't looks like the correct way to do it. What if I wanted a very large range for K?
So my question is, is it possible to perform queries in a loop, and unite the results from each query? Or, if this is not the correct approach to the problem, how would you do it?
Thanks in advance!
You could use array_positions() which returns an array of all positions where the argument was found in the array, e.g.
select t.*,
array_positions(array_column, text_column)
from the_table t;
This returns a different result but is a lot more efficient as you don't need to increase the overall size of the result. To only consider the first ten array elements, just pass a slice to the function:
select t.*,
array_positions(array_column[1:10], text_column)
from the_table t;
To limit the result to only rows that actually contain the value you can use:
select t.*,
array_positions(array_column[1:10], text_column)
from the_table t
where text_column = any(array_column[1:10]);
To get your desired result, you could use unnest() to turn that into rows:
select k, count(*)
from the_table t, unnest(array_positions(array_column[1:10], text_column)) as k
where text_column = any(array_column[1:10])
group by k
order by k;
You can use the generate_series function to generate a table with the expected number of rows with the expected values and then join to it within the query, like so:
SELECT t.k AS k, count(*)
FROM table
--right join ensures that you will get a value of 0 if there are no records meeting the criteria
right join (select generate_series(1,10) as k) t
on array_column[1:t.k] #> ARRAY[text_column]
group by t.k
This is probably the closest thing to using a loop to go through the results without using something like PL/SQL to do an actual loop in a user-defined function.

How can I convert 2 row into column in tsql?

I have 2 row data which I want to make it to be 2 column,
I tried union syntax but it didn't work.
Here is the data I have:
breed 1 breed2
I tried to convert it with this sql
select a.breed union a.breed
but it didn't work.
Here is what you want from the SQL:
breed1,breed2
SELECT
[breed1],
[breed2]]
FROM
(
SELECT 'breed1' myColumn
union
select 'breed2'
) AS SourceTable
PIVOT
(
AVG(mySecondColumn) FOR
myColumn IN ([breed1], [breed2]])
) AS PivotTable;
You can use a self join. This needs a way to pair rows together (so if you have four rows you get 1 and 2 in one result and 3 and 4 in the other rather than another combination).
I'm going to assume you have sequentially numbered rows in an Id column and an odd numbered row is paired with the one greater even Id:
select odd.Data as 'First', even.Data as 'Second'
from TheData odd
inner join TheData even on odd.Id+1 = even.Id
where odd.Id % 2 = 1;
More generally for more columns use of pivot is more flexible.
How about an aggregation query?
select min(breed) as breed1, max(breed) as breed2
from t;

How to split delimited String to multiple rows in Hive using lateral view explode

I have a table in Hive as below -
create table somedf
(sellers string ,
orders int
)
insert into somedf values
('1--**--2--**--3',50),
('1--**--2', 10)
The table has a column called sellers and it is delimited by the characters described in the insert statement. I would like to split the sellers into multiple rows so that it looks like below -
exploded_sellers orders
1 50
2 50
3 50
1 10
2 10
I am trying to use lateral view explode() function in Hive but unable to get the results. I am using the below query -
select exploded_sellers, orders
from somedf
lateral view outer explode(split(sellers,'\\--*.*\\*.*--')) t1 as exploded_sellers
which gives me below results as output -
exploded_sellers orders
1 50
3 50
1 10
2 10
This result does not split Row 1('1--**--2--**--3',50) from the table as desired and ends up in producing only 2 rows instead of 3.
Is there any other function that is needed for this task?
Does lateral view explode() only work on arrays ?
The pattern passed into split is incorrect. * character needs to be escaped. No need to escape -.
Use
select exploded_sellers, orders
from somedf
lateral view outer explode(split(sellers,'--\\*\\*--')) t1 as exploded_sellers
This would work too. It expects two occurrences of * in the middle.
select exploded_sellers, orders
from somedf
lateral view outer explode(split(sellers,'--\\*{2}--')) t1 as exploded_sellers;

Pull specific numbers from a column that has numbers and words

Column
1
7
f
3
2
c
1
d
6
4
e
g
b
I want to be able to filter this using the IN() operator in the where clause and pull out only the numbers. The column is a varchar so it is coming back as an error in postgres
select substring(colname FROM '[0-9]+') from tablename
You can filter the numbers using the ISNUMERIC() function on the WHERE Clausule.
Something like this:
SELECT *
FROM Table1
WHERE ISNUMERIC(column_name)=1
As mentioned on the comments, this is for SQL Server, but you can create your own ISNUMERIC function in PostgreSQL following this example:
isnumeric() with PostgreSQL
I ended up subquerying with this in the SELECT --- cast(substring(column FROM '[0-9]+') as int) and this in the WHERE column ~ '^\d+$' in the FROM as its own table. Pulling just the integers i needed from that with IN (1,2,3)