How to get dates as column names in SQL BigQuery? - sql

I would like to get dates (monthYear,weekYear,etc.) as columns, and having the value as count activities by users, but i'm not reaching it :(
Some example:
My table
userid
activityId
activityStatus
activityDate
A1
z1
finished
2022-08-01T15:00:00
A2
z2
finished
2022-08-07T20:00:00
A2
z3
finished
2022-08-08T10:00:00
A1
z4
finished
2022-09-17T16:00:00
A1
z5
finished
2022-09-20T17:00:00
A3
z6
finished
2022-08-19T13:00:00
What I'im trying to do, something like this (but I know now that's not working):
SELECT
userid,
COUNT(activityId) as doneActivities,
CONCAT(EXTRACT(YEAR from activityDate),'-',EXTRACT(WEEK from activityDate))
And the result of this attemp is like:
userid
doneActivities
weekYear
A1
1
31-2022
A1
2
33-2022
A2
1
31-2022
A2
1
32-2022
A3
1
33-2022
Expected result would be something like this:
userid
31-2022
32-2022
33-2022
A1
1
0
2
A2
1
1
0
A3
0
0
1
I know how to do it on Power BI, but I want to automate this for futures queries.
If it's not clear, please let me know guys, and I'll try to explain again. There is some time that I don't practice english.
Thanks in advance!

Since BigQuery doesn't support dynamic pivoting (using variables as column names) at this point (maybe due to security issues), some way around is needed.
However, the basic idea of approaching the problem would be first using (static) PIVOT. And then changing the query to string and do EXECUTE IMMEDIATE.
PIVOT
WITH
dataset AS (
SELECT 'A1' as userid, 'z1' as activityId, 'finished' as activityStatus, DATETIME('2022-08-01T15:00:00') as activityDate,
UNION ALL SELECT 'A2', 'z2', 'finished', '2022-08-07T20:00:00'
UNION ALL SELECT 'A2', 'z3', 'finished', '2022-08-08T10:00:00'
UNION ALL SELECT 'A1', 'z4', 'finished', '2022-09-17T16:00:00'
UNION ALL SELECT 'A1', 'z5', 'finished', '2022-09-20T17:00:00'
UNION ALL SELECT 'A3', 'z6', 'finished', '2022-08-19T13:00:00'
),
preprocess_dataset AS (
WITH
_dataset AS (
SELECT *, CONCAT('_', EXTRACT(YEAR from activityDate), '_', EXTRACT(WEEK from activityDate)) as year_week,
FROM dataset
)
SELECT userid, year_week, COUNT(DISTINCT activityId) as done_activities,
FROM _dataset
GROUP BY userid, year_week
)
SELECT *,
FROM preprocess_dataset
PIVOT (
SUM(done_activities) FOR year_week IN (
'_2022_31', '_2022_32', '_2022_33', '_2022_37', '_2022_38'
)
)
ORDER BY userid
;
Results

Just to add on to the great answer of #JihoChoi, if you would like to implement dynamic pivoting. See approach below:
create temp table my_table (userid string,doneActivities int64,weekYear string);
insert into my_table
select
userid,
count(distinct activityId) as doneActivities,
weekYear
from
(
select
*,
concat('_',extract(YEAR from activityDate),'_',extract(WEEK from activityDate)) as weekYear
from `project-id.dataset_id.table_id`
)
group by userid,weekYear;
execute immediate (
select '''
select * from my_table
pivot(sum(doneActivities) for weekYear in ("''' || string_agg(weekYear,'", "') || '''"))
'''
from (select * from my_table order by weekYear)
)
Output:

Related

Consolidate information (time serie) from two tables

MS SQL Server
I have two tables with different accounts from the same customer:
Table1:
ID
ACCOUNT
FROM
TO
1
A
01.10.2019
01.12.2019
1
A
01.02.2020
09.09.9999
and table2:
ID
ACCOUNT
FROM
TO
1
B
01.12.2019
01.01.2020
As result I want a table that summarize the story of this costumer and shows when he had an active account and when he doesn't.
Result:
ID
FROM
TO
ACTIV Y/N
1
01.10.2019
01.01.2020
Y
1
02.01.2020
31.01.2020
N
1
01.02.2020
09.09.9999
Y
Can someone help me with some ideas how to proceed?
This is the typical gaps and island problem, and it's not usually easy to solve.
You can achieve your goal using this query, I will explain it a little bit.
You can test on this db<>fiddle.
First of all... I have unified your two tables into one to simplify the query.
-- ##table1
select 1 as ID, 'A' as ACCOUNT, convert(date,'2019-10-01') as F, convert(date,'2019-12-01') as T into ##table1
union all
select 1 as ID, 'A' as ACCOUNT, convert(date,'2020-02-01') as F, convert(date,'9999-09-09') as T
-- ##table2
select 1 as ID, 'B' as ACCOUNT, convert(date,'2019-12-01') as F, convert(date,'2020-01-01') as T into ##table2
-- ##table3
select * into ##table3 from ##table1 union all select * from ##table2
You can then get your gaps and island using, for example, a query like this.
It combines recursive cte to generate a calendar (cte_cal) and lag and lead operations to get the previous/next record information to build the gaps.
with
cte_cal as (
select min(F) as D from ##table3
union all
select dateadd(day,1,D) from cte_cal where d < = '2021-01-01'
),
table4 as (
select t1.ID, t1.ACCOUNT, t1.F, isnull(t2.T, t1.T) as T, lag(t2.F, 1,null) over (order by t1.F) as SUP
from ##table3 t1
left join ##table3 t2
on t1.T=t2.F
)
select
ID,
case when T = D then F else D end as "FROM",
isnull(dateadd(day,-1,lead(D,1,null) over (order by D)),'9999-09-09') as "TO",
case when case when T = D then F else D end = F then 'Y' else 'N' end as "ACTIV Y/N"
from (
select *
from cte_cal c
cross apply (
select t.*
from table4 t
where t.SUP is null
and (
c.D = t or
c.D = dateadd(day,1,t.T)
)
) t
union all
select F, * from table4 where T = '9999-09-09'
) p
order by 1
option (maxrecursion 0)
Dates like '9999-09-09' must be treated like exceptions, otherwise I would have to create a calendar until that date, so the query would take long time to resolve.

SQL query to get column names if it has specific value

I have a situation here, I have a table with a flag assigned to the column names(like 'Y' or 'N'). I have to select the column names of a row, if it have a specific value.
My Table:
Name|sub-1|sub-2|sub-3|sub-4|sub-5|sub-6|
-----------------------------------------
Tom | Y | | Y | Y | | Y |
Jim | Y | Y | | | Y | Y |
Ram | | Y | | Y | Y | |
So I need to get, what are all the subs are have 'Y' flag for a particular Name.
For Example:
If I select Tom I need to get the list of 'Y' column name in query output.
Subs
____
sub-1
sub-3
sub-4
sub-6
Your help is much appreciated.
The problem is that your database model is not normalized. If it was properly normalized the query would be easy. So the workaround is to normalize the model "on-the-fly" to be able to make the query:
select col_name
from (
select name, sub_1 as val, 'sub_1' as col_name
from the_table
union all
select name, sub_2, 'sub_2'
from the_table
union all
select name, sub_3, 'sub_3'
from the_table
union all
select name, sub_4, 'sub_4'
from the_table
union all
select name, sub_5, 'sub_5'
from the_table
union all
select name, sub_6, 'sub_6'
from the_table
) t
where name = 'Tom'
and val = 'Y'
The above is standard SQL and should work on any (relational) DBMS.
Below code works for me.
select t.Subs from (select name, u.subs,u.val
from TableName s
unpivot
(
val
for subs in (sub-1, sub-2, sub-3,sub-4,sub-5,sub-6,sub-7)
) u where u.val='Y') T
where t.name='Tom'
Somehow I am near to the solution. I can get for all rows. (I just used 2 columns)
select col from ( select col, case s.col when 'sub-1' then sub-1 when 'sub-2' then sub-2 end AS val from mytable cross join ( select 'sub-1' AS col union all select 'sub-2' ) s ) s where val ='Y'
It gives the columns for all row. I need the same data for a single row. Like if I select "Tom", I need the column names for 'Y' value.
I'm answering this under a few assumptions here. The first is that you KNOW the names of the columns of the table in question. Second, that this is SQL Server. Oracle and MySql have ways of performing this, but I don't know the syntax for that.
Anyways, what I'd do is perform an 'UNPIVOT' on the data.
There's a lot of parans there, so to explain. The actual 'unpivot' statement (aliased as UNPVT) takes the data and twists the columns into rows, and the SELECT associated with it provides the data that is being returned. Here's I used the 'Name', and placed the column names under the 'Subs' column and the corresponding value into the 'Val' column. To be precise, I'm talking about this aspect of the above code:
SELECT [Name], [Subs], [Val]
FROM
(SELECT [Name], [Sub-1], [Sub-2], [Sub-3], [Sub-4], [Sub-5], [Sub-6]
FROM pvt) p
UNPIVOT
(Orders FOR [Name] IN
([Name], [Sub-1], [Sub-2], [Sub-3], [Sub-4], [Sub-5], [Sub-6])
)AS unpvt
My next step was to make that a 'sub-select' where I could find the specific name and val that was being hunted for. That would leave you with a SQL Statement that looks something along these lines
SELECT [Name], [Subs], [Val]
FROM (
SELECT [Name], [Subs], [Val]
FROM
(SELECT [Name], [Sub-1], [Sub-2], [Sub-3], [Sub-4], [Sub-5], [Sub-6]
FROM pvt) p
UNPIVOT
(Orders FOR [Name] IN
([Name], [Sub-1], [Sub-2], [Sub-3], [Sub-4], [Sub-5], [Sub-6])
)AS unpvt
) AS pp
WHERE 1 = 1
AND pp.[Val] = 'Y'
AND pp.[Name] = 'Tom'
select col from (
select col,
case s.col
when 'sub-1' then sub-1
when 'sub-2' then sub-2
when 'sub-3' then sub-3
when 'sub-4' then sub-4
when 'sub-5' then sub-5
when 'sub-6' then sub-6
end AS val
from mytable
cross join
(
select 'sub-1' AS col union all
select 'sub-2' union all
select 'sub-3' union all
select 'sub-4' union all
select 'sub-5' union all
select 'sub-6'
) s on name="Tom"
) s
where val ='Y'
included the join condition as
on name="Tom"

prevent query from return duplicate results

I have a database contains information for a telecommunication company with the following table:
SUBSCRIBERS (SUB_ID , F_NAME , L_NANE , DATE_OF_BIRTH , COUNTRY)
LINES (LINE_ID , LINE_NUMBER)
SUBSCRIBERS_LINES (SUB_LINE_ID , SUB_ID "foreign key", LINE_ID "foreign key", ACTIVATION_DATE)
CALLS (CALL_ID , LINE_FROM "foreign key", LINE_TO "foreign key" , START_DATE_CALL, END_DATE_CALL)
I want to retrieve the names of top 3 subscribers who make the highest count number of calls (with duration less than 60 seconds for each call) in specific given day.
So, I write the following query :
with TEMPRESULT AS
(
select * from
(
select CALLS.LINE_FROM , count(*) totalcount
from CALLS
where (((END_DATE_CALL-START_DATE_DATE)*24*60*60)<=60 and to_char(S_DATE,'YYYY-MM-DD')='2015-12-12')
group by CALLS.LINE_FROM
order by totalcount DESC
)
where rownum <= 3
)
select F_NAME,L_NAME
from TEMPRESULT inner join SUBSCRIBERS_LINES on TEMPRESULT.LINE_FROM=SUBSCRIBERS_LINES.line_id inner join SUBSCRIBERS on SUBSCRIBERS_LINES.SUB_ID=SUBSCRIBERS.SUB_ID;
But this query will not work if one of the subscribers has more than one line,
for example:
(X1 has L1 and L2 lines
X2 has L3
X3 has L4)
if X1 talks 20 calls from L1, and 19 calls from L2
X2 talks 15 calls from L3
X3 talks 10 calls from L4
my query will return the following output:
X1
X1
X2
it must return :
X1
X2
X3
how to modify the query to not return duplicate name ?
The subquery must GROUP BY on SUB_ID (not on LINE_FROM). This will provide the total calls of a subscriber and not the top line calls.
In other words move the join in the subquery and group and order by SUB_ID.
DISTINCT in the main query is too late, you will get no duplicates but less results.
Could you try adding the DISTINCT keyword to the SELECT query at the bottom?
Something like this:
with TEMPRESULT AS
(
select * from
(
select CALLS.LINE_FROM , count(*) totalcount
from CALLS
where (((END_DATE_CALL-START_DATE_DATE)*24*60*60)<=60 and to_char(S_DATE,'YYYY-MM-DD')='2015-12-12')
group by CALLS.LINE_FROM
order by totalcount DESC
)
where rownum <= 3
)
select DISTINCT F_NAME,L_NAME
from TEMPRESULT
inner join SUBSCRIBERS_LINES on TEMPRESULT.LINE_FROM = SUBSCRIBERS_LINES.line_id
inner join SUBSCRIBERS on SUBSCRIBERS_LINES.SUB_ID = SUBSCRIBERS.SUB_ID;
In theory (I haven't tested it by creating this database) this should show:
X1
X2
X3
how about something like this
(T represents the result from your query)
WITH t AS
(SELECT 1 id, 'x1' subscriber, 'l1' line FROM dual
UNION ALL
SELECT 2, 'x1', 'l1' FROM dual
UNION ALL
SELECT 3, 'x1', 'l1' FROM dual
UNION ALL
SELECT 4, 'x1', 'l2' FROM dual
UNION ALL
SELECT 5, 'x1', 'l2' FROM dual
UNION ALL
SELECT 6, 'x1', 'l2' FROM dual
UNION ALL
SELECT 6, 'x1', 'l2' FROM dual
UNION ALL
SELECT 7, 'x2', 'l3' FROM dual
UNION ALL
SELECT 8, 'x2', 'l3' FROM dual
UNION ALL
SELECT 9, 'x3', 'l4' FROM dual
),
t1 AS
(SELECT COUNT(subscriber) totalcount,
line,
MAX(subscriber) keep (dense_rank last
ORDER BY line ) subscribers
FROM t
GROUP BY line
ORDER BY 1 DESC
)
SELECT subscribers,
listagg(line
||' had '
|| totalcount
|| ' calls ', ',') within GROUP (
ORDER BY totalcount) AS lines
FROM t1
GROUP BY subscribers
the results
subscribers lines
x1 l1 had 3 calls, l2 had 4 calls
x2 l3 had 2 calls
x3 l4 had 1 calls

Remove + - value records in SQL where clause

I need to remove the + - values records mean to say
I need only Blue colored two records from the output windows.
Hope its clear what exactly I want.
User5 | -15
User6 | -10
The idea is to get rows whose second column, in my case it's Val, is are cancelled out. You can do it by getting the absolute value and assign a row number grouped by absolute value and the value itself. Those row number that does not have a match should be the result.
WITH SampleData(UserID, Val) AS(
SELECT 'User1', -10 UNION ALL
SELECT 'User2', 10 UNION ALL
SELECT 'User3', -15 UNION ALL
SELECT 'User4', -10 UNION ALL
SELECT 'User5', -15 UNION ALL
SELECT 'User6', -10 UNION ALL
SELECT 'User7', 10 UNION ALL
SELECT 'User8', 15
)
,Numbered AS(
SELECT
UserID,
Val,
BaseVal = ABS(Val),
RN = ROW_NUMBER() OVER(PARTITION BY ABS(Val), Val ORDER BY UserId)
FROM SampleData
)
SELECT
n1.UserID,
n1.Val
FROM Numbered n1
LEFT JOIN Numbered n2
ON n2.BaseVal = n1.BaseVal
AND n2.RN = n1.rn
AND n2.UserID <> n1.UserID
WHERE n2.UserID IS NULL
ORDER BY n1.UserID
Appears that you want rows where the total does not equal 0?
select
userName,
userValue
from
yourTable
where
userName in (
select userName from yourTable
group by userName
having sum (userValue) <> 0
)

sql select to start with a particular record

Is there any way to write a select record starting with a particular record? Suppose I have an table with following data:
SNO ID ISSUE
----------------------
1 A1 unknown
2 A2 some_issue
3 A1 unknown2
4 B1 some_issue2
5 B3 ISSUE4
6 B1 ISSUE4
Can I write a select to start showing records starting with B1 and then the remaining records? The output should be something like this:
4 B1 some_issue2
6 B1 ISSUE4
1 A1 unknown
2 A2 some_issue
3 A1 unknown2
5 B3 ISSUE4
It doesn't matter if B3 is last, just that B1 should be displayed first.
Couple of different options depending on what you 'know' ahead of time (i.e. the id of the record you want to be first, the sno, etc.):
Union approach:
select 1 as sortOrder, SNO, ID, ISSUE
from tableName
where ID = 'B1'
union all
select 2 as sortOrder, SNO, ID, ISSUE
from tableName
where ID <> 'B1'
order by sortOrder;
Case statement in order by:
select SNO, ID, ISSUE
from tableName
order by case when ID = 'B1' then 1 else 2 end;
You could also consider using temp tables, cte's, etc., but those approaches would likely be less performant...try a couple different approaches in your environment to see which works best.
Assuming you are using MySQL, you could either use IF() in an ORDER BY clause...
SELECT SNO, ID, ISSUE FROM table ORDER BY IF( ID = 'B1', 0, 1 );
... or you could define a function that imposes your sort order...
DELIMITER $$
CREATE FUNCTION my_sort_order( ID VARCHAR(2), EXPECTED VARCHAR(2) )
RETURNS INT
BEGIN
RETURN IF( ID = EXPECTED, 0, 1 );
END$$
DELIMITER ;
SELECT SNO, ID, ISSUE FROM table ORDER BY my_sort_sort( ID, 'B1' );
select * from table1
where id = 'B1'
union all
select * from table1
where id <> 'B1'