Hello I am kinda new to sql. Just wanna know if this is possible via sql:
Table: (Multiple values are in just 1 cell.)
COLUMN 1
COLUMN 2
"2023-01-01", "2023-01-02", "2023-01-03"
"User A, User B, User C"
Needed Output:
COLUMN 1
COLUMN 2
2023-01-01
User A
2023-01-02
User A
2023-01-03
User A
2023-01-01
User B
2023-01-02
User B
2023-01-03
User B
2023-01-01
User C
2023-01-02
User C
2023-01-03
User C
Basically, each date from the row is assigned to all users in that same row. Any help or tip will be appreciated.
Thank you!
Screenshot of data/required table
I have no idea yet on how to go around this
You can use the string_to_array function to get all parts of a string as elements of an array, then use the unnest function on that array to get the desired result, check the following:
select col1,
unnest(string_to_array(replace(replace(COLUMN2,'"',''),', ',','), ',')) as col2
from
(
select unnest(string_to_array(replace(replace(COLUMN1,'"',''),', ',','), ',')) as col1
, COLUMN2
from table_name
) T
order by col1, col2
See demo
We can use a combination of STRING_TO_ARRAY with UNNEST and LATERAL JOIN here:
SELECT col1.column1, col2.column2
FROM
(SELECT UNNEST(
STRING_TO_ARRAY(column1,',')
) AS column1 FROM test) col1
LEFT JOIN LATERAL
(SELECT UNNEST(
STRING_TO_ARRAY(column2,',')
) AS column2 FROM test) col2
ON true
ORDER BY col2.column2, col1.column1;
Try out: db<>fiddle
STRING_TO_ARRAY will split the different dates and the different users into separate items.
UNNEST will write those items in separate rows.
LATERAL JOIN will put the three dates together with the three users (or of course less/more, depending on your data) and so creates the nine rows shown in your question. It works similar to the CROSS APPLY approach which will do on a SQL Server DB.
The ORDER BY clause just creates the same order as shown in your question, we can remove it if not required. The question doesn't really tell us if it's needed.
Because implementation details van change on different DBMS's, here is an example of how to do it in MySQL (8.0+):
WITH column1 as (
SELECT TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(column1,',',x),',',-1)) as Value
FROM test
CROSS JOIN (select 1 as x union select 2 union select 3 union select 4) x
WHERE x <= LENGTH(Column1)-LENGTH(REPLACE(Column1,',',''))+1
),
column2 as (
SELECT TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(column2,',',x),',',-1)) as Value
FROM test
CROSS JOIN (select 1 as x union select 2 union select 3 union select 4) x
WHERE x <= LENGTH(Column2)-LENGTH(REPLACE(Column2,',',''))+1
)
SELECT *
FROM column1, column2;
see: DBFIDDLE
NOTE:
The CROSS JOIN, with only 4 values should be expanded when more than 4 items exist.
There is not data type connected to the values that are fetched. This implementation does not know that "2023-01-08" is, sorry CAN BE, a date. It just sticks to strings.
In sql server this can be done using string_split
select x.value as date_val,y.value as user_val
from test a
CROSS APPLY string_split(Column1,',')x
CROSS APPLY string_split(Column2,',')y
order by y.value,x.value
date_val user_val
2023-01-01 User A
2023-01-02 User A
2023-01-03 User A
2023-01-03 User B
2023-01-02 User B
2023-01-01 User B
2023-01-01 User C
2023-01-02 User C
2023-01-03 User C
db fiddle link
https://dbfiddle.uk/YNJWDPBq
In mysql you can do it as follows :
WITH dates as (
select TRIM(SUBSTRING_INDEX(_date, ',', 1)) AS 'dates'
from _table
union
select TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(_date, ',', 2), ',', -1)) AS 'dates'
from _table
union
select TRIM(SUBSTRING_INDEX(_date, ',', -1)) AS 'dates'
from _table
),
users as
( select TRIM(SUBSTRING_INDEX(user, ',', 1)) AS 'users'
from _table
union
select TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(user, ',', 2), ',', -1)) AS 'users'
from _table
union
select TRIM(SUBSTRING_INDEX(user, ',', -1)) AS 'users'
from _table
)
select *
from dates, users
order by dates, users;
check it here : https://dbfiddle.uk/_oGix9PD
Related
I'm a little stumped on this. I have a table that looks like the following:
Group_Key Trigger_Type Event_Type Result_Id
1 A A 1
2 B B 2
3 C C 3
3 C C 4
4 E E 5
5 F F 6
5 F F 7
There are rows that will have the same survey (all columns should be the same aside from result_id) key but they will have a different result_Id. Is it possible to do a select on the table that grabs the rows and instead of returning 2 rows because of the result_id, it groups those ones that have dupes into a single row with the result_id being a concatenated string? So for instance, return this:
Group_Key Trigger_Type Event_Type Result_Id
1 A A 1
2 B B 2
3 C C 3,4
4 E E 5
5 F F 6,7
Is this possible?
Thank you,
Here's an example using a recursive CTE to replicate the functionality of string_agg(). This example is from the upsert scripts for execsql, and was written by Elizabeth Shea. It will have to be modified for your particular use, substituting your own column names for the execsql variable references.
if object_id('tempdb..#agg_string') is not null drop table #agg_string;
with enum as
(
select
cast(!!#string_col!! as varchar(max)) as agg_string,
row_number() over (order by !!#order_col!!) as row_num
from
!!#table_name!!
),
agg as
(
select
one.agg_string,
one.row_num
from
enum as one
where
one.row_num=1
UNION ALL
select
agg.agg_string + '!!#delimiter!!' + enum.agg_string as agg_string,
enum.row_num
from
agg, enum
where
enum.row_num=agg.row_num+1
)
select
agg_string
into #agg_string
from agg
where row_num=(select max(row_num) from agg);
Using Gordon Linoff's hint you can GROUP BY the values that should be the same and concatenate the other values in one row using STRING_AGG:
SELECT Group_Key, Trigger_Type, Event_Type, STRING_AGG(Result_Id, ',') as ResultId
FROM myTable
GROUP BY Group_Key, Trigger_Type, Event_Type
I would add the ordering of values to MicSim's solutions as follows:
select Group_Key, Trigger_Type, Event_Type,
string_agg(Result_Id, ',') within group (order by Result_Id)
from survey
group by Group_Key, Trigger_Type, Event_Type
MS-SQL Server will not recognise STRING_AGG function. Try stuff() as shown below:
SELECT Group_Key ,Trigger_Type, Event_Type,
STUFF(
(SELECT CONCAT( Result_Id , ', ') AS [text()]
FROM [dbo].[TestTable] t2
WHERE t1.Group_Key = t2.Group_Key
AND t1.Trigger_Type = t2.Trigger_Type
AND t1.Event_Type= t2.Event_Type
ORDER BY Group_Key ,Trigger_Type, Event_Type
FOR XML PATH('')), 1, 0, '') AS Result_Id
FROM [dbo].[TestTable] t1
GROUP BY Group_Key ,Trigger_Type, Event_Type
Hope this helps.
I need help on how to use BigQuery UNNEST function. My query:
I have table as shown in the image and I want to unnest the field "domains" (string type) currently separated by comma, so that I get each comma separated domain into a different row for each "acname". The output needed is also enclosed in the image:
enter image description here
I tried this logic but did not work:
select acc.acname,acc.amount,acc.domains as accdomains from project.dataset.dummy_account as acc
CROSS JOIN UNNEST(acc.domains)
But this gave error "Values referenced in UNNEST must be arrays. UNNEST contains expression of type STRING". The error makes sense completely but did not understand, how to convert string to an array.
Can someone please help with solution and also explain a bit, how actually it works. Thank you.
Below is for BigQuery Standard SQL
#standardSQL
SELECT acname, amount, domain
FROM `project.dataset.dummy`,
UNNEST(SPLIT(domains)) domain
You can test, play with above using dummy data from your question as in example below
#standardSQL
WITH `project.dataset.dummy` AS (
SELECT 'abc' acname, 100 amount, 'a,b,c' domains UNION ALL
SELECT 'pqr', 300, 'p,q,r' UNION ALL
SELECT 'lmn', 500, 'l,m,n'
)
SELECT acname, amount, domain
FROM `project.dataset.dummy`,
UNNEST(SPLIT(domains)) domain
with output
Row acname amount domain
1 abc 100 a
2 abc 100 b
3 abc 100 c
4 pqr 300 p
5 pqr 300 q
6 pqr 300 r
7 lmn 500 l
8 lmn 500 m
9 lmn 500 n
The source table project.dataset.dummy which had field "domains" has comma separated values but after the comma there is a space (e.g. 'a'commaspace'b'commaspacec a, b, c). This results in space before the values b c q r m n; in the field "domains" in "Output After Unnest" table. Now I'm joining this table with "salesdomain" as a key. But because of space before b c q r m n, the output received is not correct
To address this - you can just simply use TRIM function to removes all leading and trailing spaces, like in example below
#standardSQL
WITH `project.dataset.dummy` AS (
SELECT 'abc' acname, 100 amount, 'a, b, c' domains UNION ALL
SELECT 'pqr', 300, 'p, q, r' UNION ALL
SELECT 'lmn', 500, 'l, m, n'
)
SELECT acname, amount, TRIM(domain, ' ') domain
FROM `project.dataset.dummy`,
UNNEST(SPLIT(domains)) domain
for the end goal, I want to create a table that looks like something like this:
Table 1
option_ID person_ID option
1 1 B
2 1
3 2 C
4 2 A
5 3 A
6 3 B
The idea is that a person can choose up to 2 options out of 3 (in this case person 1 only chose 1 option). However, when my raw data format puts the 3 options into one single column, ie:
Table 2
person_ID option
1 B
2 C,A
3 A,B
What I usually do is the use 'Text to Columns' function using the ',' delimiter in Excel, and manually concatenate the 2 columns vertically. However, I find this method to become impractical when faced with more options (say 10 or even 20). Is there a way for me to get from Table 2 to Table 1 efficiently using postgresql or some other methods?
use string_agg() function.
select person_ID, string_agg(option, ',') as option
from table1
group by person_ID
You can use regexp_split_to_table():
select row_number() over () as id,
t.person_id, v.option
from t cross join lateral
regexp_split_to_table(t.option, ',') option
order by person_id, option;
Here is a db<>fiddle.
Actually, if you want the exactly two rows per personid:
select row_number() over () as id, t.person_id, v.option
from t cross join lateral
(values (1, split_part(t.option, ',', 1)), (2, split_part(t.option, ',', 2))) v(pos, option)
order by person_id, pos;
Sample data
CREATE TEMP TABLE a AS
SELECT id, adate::date, name
FROM ( VALUES
(1,'1/1/1900','test'),
(1,'3/1/1900','testing'),
(1,'4/1/1900','testinganother'),
(1,'6/1/1900','superbtest'),
(2,'1/1/1900','thebesttest'),
(2,'3/1/1900','suchtest'),
(2,'4/1/1900','test2'),
(2,'6/1/1900','test3'),
(2,'7/1/1900','test4')
) AS t(id,adate,name);
CREATE TEMP TABLE b AS
SELECT id, bdate::date, score
FROM ( VALUES
(1,'12/31/1899', 7 ),
(1,'4/1/1900' , 45),
(2,'12/31/1899', 19),
(2,'5/1/1900' , 29),
(2,'8/1/1900' , 14)
) AS t(id,bdate,score);
What I want
What I need to do is aggregate column text from table a where the id matches table b and the date from table a is between the two closest dates from table b. Desired output:
id date score textagg
1 12/31/1899 7 test, testing
1 4/1/1900 45 testinganother, superbtest
2 12/31/1899 19 thebesttest, suchtest, test2
2 5/1/1900 29 test3, test4
2 8/1/1900 14
My thoughts are to do something like this:
create table date_join
select a.id, string_agg(a.text, ','), b.*
from tablea a
left join tableb b
on a.id = b.id
*having a.date between b.date and b.date*;
but I am really struggling with the last line, figuring out how to aggregate only where the date in table b is between the closest two dates in table b. Any guidance is much appreciated.
I can't promise it's the best way to do it, but this is a way to do it.
with b_values as (
select
id, date as from_date, score,
lead (date, 1, '3000-01-01')
over (partition by id order by date) - 1 as thru_date
from b
)
select
bv.id, bv.from_date, bv.score,
string_agg (a.text, ',')
from
b_values as bv
left join a on
a.id = bv.id and
a.date between bv.from_date and bv.thru_date
group by
bv.id, bv.from_date, bv.score
order by
bv.id, bv.from_date
I'm presupposing you will never have a date in your table greater than 12/31/2999, so if you're still running this query after that date, please accept my apologies.
Here is the output I got when I ran this:
id from_date score string_agg
1 0 7 test,testing
1 92 45 testinganother,superbtest
2 0 19 thebesttest,suchtest,test2
2 122 29 test3,test4
2 214 14
I might also note that between in a join is a performance killer. IF you have large data volumes, there might be better ideas on how to approach this, but that depends largely on what your actual data looks like.
Fair warning: I'm new to using SQL. I do so on an Oracle server either via AQT or with SQL Developer.
As I haven't been able to think or search my way to an answer, I put myself in your able hands...
I'd like to combine data from table A (high quality data) with data from table B (fresh data) such that the entries from B are only included when the date stamp are later than those available from table A.
Both tables include entries from multiple entities, and the latest date stamp varies with those entities.
On the 4th of january, the tables may look something like:
A____________________________ B_____________________________
entity date type value entity date type value
X 1.jan 1 1 X 1.jan 1 2
X 1.jan 0 1 X 1.jan 0 2
X 2.jan 1 1 X 2.jan 1 2
Y 1.jan 1 1 (new entry)X 3.jan 1 1
Y 3.jan 1 1 Y 1.jan 1 2
Y 3.jan 1 2
(new entry)Y 4.jan 1 1
I have made an attempt at some code that I hope clarify my need:
WITH
AA AS
(SELECT entity, date, SUM(value)
FROM table_A
GROUP BY
entity,
date),
BB AS
(SELECT entity, date, SUM(value)
FROM table_B
WHERE date > ALL (SELECT date FROM AA)
GROUP BY
entity,
date
)
SELECT * FROM (SELECT * FROM AA UNION ALL SELECT * FROM BB)
Now, if the WHERE date > ALL (SELECT date FROM AA)would work seperately for each entity, I think have what I need.
That is, for each entity I want all entries from A, and only newer entries from B.
As the data in table A often differ from that of B (values are often corrected) I dont think I can use something like: table A UNION ALL (table B MINUS table A)?
Thanks
Essentially you are looking for entries in BB which do not exist in AA. When you are doing date > ALL (SELECT date FROM AA) this will not take into consideration the entity in question and you will not get the correct records.
Alternative is to use the JOIN and filter out all matching entries with AA.
Something like below.
WITH
AA AS
(SELECT entity, date, SUM(value)
FROM table_A
GROUP BY
entity,
date),
BB AS
(SELECT entity, date, SUM(value)
FROM table_B
LEFT OUTER JOIN AA
ON AA.entity = BB.entity
AND AA.DATE = BB.date
WHERE AA.date == null
GROUP BY
entity,
date
)
SELECT * FROM (SELECT * FROM AA UNION ALL SELECT * FROM BB)
I find your question confusing, because I don't know where the aggregation is coming from.
The basic idea on getting newer rows from table_b uses conditions in the where clause, something like this:
select . . .
from table_a a
union all
select . . .
from table_b b
where b.date > (select max(a.date) from a where a.entity = b.entity);
You can, of course, run this on your CTEs, if those are what you really want to combine.
Use UNION instead of UNION ALL , it will remove the duplicate records
SELECT * FROM (
SELECT *
FROM AA
UNION
SELECT *
FROM BB )