How to concatenate multiple columns vertically efficiently - sql

for the end goal, I want to create a table that looks like something like this:
Table 1
option_ID person_ID option
1 1 B
2 1
3 2 C
4 2 A
5 3 A
6 3 B
The idea is that a person can choose up to 2 options out of 3 (in this case person 1 only chose 1 option). However, when my raw data format puts the 3 options into one single column, ie:
Table 2
person_ID option
1 B
2 C,A
3 A,B
What I usually do is the use 'Text to Columns' function using the ',' delimiter in Excel, and manually concatenate the 2 columns vertically. However, I find this method to become impractical when faced with more options (say 10 or even 20). Is there a way for me to get from Table 2 to Table 1 efficiently using postgresql or some other methods?

use string_agg() function.
select person_ID, string_agg(option, ',') as option
from table1
group by person_ID

You can use regexp_split_to_table():
select row_number() over () as id,
t.person_id, v.option
from t cross join lateral
regexp_split_to_table(t.option, ',') option
order by person_id, option;
Here is a db<>fiddle.
Actually, if you want the exactly two rows per personid:
select row_number() over () as id, t.person_id, v.option
from t cross join lateral
(values (1, split_part(t.option, ',', 1)), (2, split_part(t.option, ',', 2))) v(pos, option)
order by person_id, pos;

Related

Snowflake: Repeating rows based on column value

How to repeat rows based on column value in snowflake using sql.
I tried a few methods but not working such as dual and connect by.
I have two columns: Id and Quantity.
For each ID, there are different values of Quantity.
So if you have a count, you can use a generator:
with ten_rows as (
select row_number() over (order by null) as rn
from table(generator(ROWCOUNT=>10))
), data(id, count) as (
select * from values
(1,2),
(2,4)
)
SELECT
d.*
,r.rn
from data as d
join ten_rows as r
on d.count >= r.rn
order by 1,3;
ID
COUNT
RN
1
2
1
1
2
2
2
4
1
2
4
2
2
4
3
2
4
4
Ok let's start by generating some data. We will create 10 rows, with a QTY. The QTY will be randomly chosen as 1 or 2.
Next we want to duplicate the rows with a QTY of 2 and leave the QTY =1 as they are.
Obviously you can change all parameters above to suit your needs - this solution works super fast and in my opinion way better than table generation.
Simply stack SPLIT_TO_TABLE(), REPEAT() with a LATERAL() join and voila.
WITH TEN_ROWS AS (SELECT ROW_NUMBER()OVER(ORDER BY NULL)SOME_ID,UNIFORM(1,2,RANDOM())QTY FROM TABLE(GENERATOR(ROWCOUNT=>10)))
SELECT
TEN_ROWS.*
FROM
TEN_ROWS,LATERAL SPLIT_TO_TABLE(REPEAT('hire me $10/hour',QTY-1),'hire me $10/hour')ALTERNATIVE_APPROACH;

duplication of rows in table

I have a table which has many rows which are same, except for the id column. How can I show only one row for other duplicate row?
id name roll_number
1 a 1
2 b 2
3 a 1
4 b 2
5 c 3
6 d 4
7 d 4
show output like this
id name roll_number
1 a 1
2 b 2
5 c 3
6 d 4
We can use DISTINCT ON here:
SELECT DISTINCT ON (name) id, name, roll_number
FROM yourTable
ORDER BY name, id;
This query is selecting one record with the lowest id from each group of records having the same name.
Simple aggregation using min
select Min(id), name,roll_number
from t
group by name, roll_number
You could use the numpy.unique(filt, trim='fb') function:
>>> import numpy as np
>>> np.unique(array)
This problem requires to "filter out" tuples during the projection based on groups. The solution is to use distinct on.
SELECT DISTINCT ON (name, roll_number) id, name, roll_number
FROM table
ORDER BY name, id;
it basically creates groups by the attributes within the "DISTINCT_ON" and non-deterministically chooses one tuple, which it outputs.

How to count the amount of entry in a column separated by a comma

Currently I have a table as my database, and I want to create a Bar Chart to of out the Reasons Column. This is an example of my table:
Table Name: Survey
id
reasons
1
a,b,c
2
a,d,e
3
b,c,d
How to count total amount of each reasons like this table below?
reasons
total
a
2
b
2
c
2
d
1
e
1
You would use string_split():
select s.value as reason, count(*)
from t cross apply
string_split(reasons, ',') s
group by s.value
order by s.value;
That said, you should fix your data model. You should have a separate table with one row per reason.

columns to rows change in oracle sql

I have columns a,b in table x.And i want to change this columns data into rows.
it is possible to have duplicate vales in table but in columns to row change only distinct values should come.
E.G:
a b
1 2
1 11
3 4
5 6
7 8
9 10
......etc
the result 1 (query 1) should be 1-2,1-11,3-4,5-6,7-8,9-10.....etc
The result 2 (query 2) should b 1,3,5,7,9....etc(only one 1 must come as we have duplicate data for column a)
how can i achieve this in oracle SQL.
Please help.
For Oracle 11 use function listagg() and in first query concatenate columns, in second - select distinct values at first.
Query 1:
select listagg(a||'-'||b, ',') within group (order by a, b) result from t
RESULT
------------------------------
1-2,1-11,3-4,5-6,7-8,9-10
Query 2:
select listagg(a, ',') within group (order by a) result
from (select distinct a from t)
RESULT
------------------------------
1,3,5,7,9
For older versions you can use wmsys.wm_concat.

Select values in SQL that do not have other corresponding values except those that i search for

I have a table in my database:
Name | Element
1 2
1 3
4 2
4 3
4 5
I need to make a query that for a number of arguments will select the value of Name that has on the right side these and only these values.
E.g.:
arguments are 2 and 3, the query should return only 1 and not 4 (because 4 also has 5). For arguments 2,3,5 it should return 4.
My query looks like this:
SELECT name FROM aggregations WHERE (element=2 and name in (select name from aggregations where element=3))
What do i have to add to this query to make it not return 4?
A simple way to do it:
SELECT name
FROM aggregations
WHERE element IN (2,3)
GROUP BY name
HAVING COUNT(element) = 2
If you want to add more, you'll need to change both the IN (2,3) part and the HAVING part:
SELECT name
FROM aggregations
WHERE element IN (2,3,5)
GROUP BY name
HAVING COUNT(element) = 3
A more robust way would be to check for everything that isn't not in your set:
SELECT name
FROM aggregations
WHERE NOT EXISTS (
SELECT DISTINCT a.element
FROM aggregations a
WHERE a.element NOT IN (2,3,5)
AND a.name = aggregations.name
)
GROUP BY name
HAVING COUNT(element) = 3
It's not very efficient, though.
Create a temporary table, fill it with your values and query like this:
SELECT name
FROM (
SELECT DISTINCT name
FROM aggregations
) n
WHERE NOT EXISTS
(
SELECT 1
FROM (
SELECT element
FROM aggregations aii
WHERE aii.name = n.name
) ai
FULL OUTER JOIN
temptable tt
ON tt.element = ai.element
WHERE ai.element IS NULL OR tt.element IS NULL
)
This is more efficient than using COUNT(*), since it will stop checking a name as soon as it finds the first row that doesn't have a match (either in aggregations or in temptable)
This isn't tested, but usually I would do this with a query in my where clause for a small amount of data. Note that this is not efficient for large record counts.
SELECT ag1.Name FROM aggregations ag1
WHERE ag1.Element IN (2,3)
AND 0 = (select COUNT(ag2.Name)
FROM aggregatsions ag2
WHERE ag1.Name = ag2.Name
AND ag2.Element NOT IN (2,3)
)
GROUP BY ag1.name;
This says "Give me all of the names that have the elements I want, but have no records with elements I don't want"