I am working on a SQL query in Azure Databricks Environment, where considering the following dataset:
CREATE OR REPLACE TABLE tb_user_info
(
clientid INT,
visitid STRING,
channel STRING,
conversion INT,
index INT,
value STRING
);
INSERT INTO tb_user_info VALUES
(123, 'abc123', 'google', 1, 11, '1250'),
(123, 'abc123', 'google', 1, 22, '25000'),
(123, 'abc123', 'google', 1, 33, '1K and 3K'),
(456, 'def456', 'facebook', 3, 11, '2860'),
(456, 'def456', 'facebook', 3, 22, '78000'),
(456, 'def456', 'facebook', 3, 33, '3K and 5K');
SELECT * FROM tb_user_info ORDER BY clientid, index
clientid
visitid
channel
conversion
index
value
123
abc123
google
1
11
1250
123
abc123
google
1
22
25000
123
abc123
google
1
33
1K and 3k
456
def456
facebook
3
11
2860
456
def456
facebook
3
22
78000
456
def456
facebook
3
33
3K and 5k
I want to get the following output:
clientid
visitid
channel
conversion
salary (index=11)
savings (index=22)
salary range (index=33)
123
abc123
google
1
1250
25000
1K and 3k
456
def456
facebook
1
2860
78000
3K and 5k
where the columns clientid, visitid, channel and conversion are grouped and the columns index and value are the columns that are pivoted.
I've tried using the Pivot function and I read this Documentation but I haven't been successful.
Could you help me with how can I solve this task?
I am not sure what actual problem you have encounted, I write one query, it seems work normally,
SELECT * FROM (
SELECT clientid, visitid , channel , conversion , ind , value
FROM tb_user_info
) ss
PIVOT (
max(value)
FOR ind in (
[11] ,[22] ,[33]
)
) as a
Related
I have a two tables comments table and responses table #azure data warehouse.
Comments table
commenid
comment
1
Hi aaa
2
Hi xxx
3
Hi yyy
Responses table
Responseid
response
linkid
createddate
123
open ticket
1
10-25-2021
124
Activate
123 ( this is my privious responseid)
10-26-2021
3452
Close
124
10-30-2021
532
reply to xxx
2
10-25-2021
3214
closed
532
10-29-2021
654
hold
3
11-14-2021
Comment table and responses table has comment and first response relationship.
Responses table has first comment response and next response.
Now i need a query which gives
My response tables with commentid like below in Azure sql
I tried joins but not showing the below results
Responseid
response
linkid
comments
123
open ticket
1
1
124
Activate
123
1
3452
Close
124
1
532
reply to xxx
2
2
3214
closed
532
2
654
hold
3
3
Please help me. Thank you in advance.
With the following tables :
CREATE TABLE T_COMMENTS
(commenid INT, comment VARCHAR(256));
INSERT INTO T_COMMENTS VALUES
(1, 'Hi aaa'),
(2, 'Hi xxx'),
(3, 'Hi yyy');
CREATE TABLE T_RESPONSES
(Responseid int, response VARCHAR(256), linkid int, createddate DATE)
INSERT INTO T_RESPONSES VALUES
(123, 'open ticket', 1, '10-25-2021'),
(124, 'Activate', 123, '10-26-2021'),
(3452, 'Close', 124, '10-30-2021'),
(532, 'reply to xxx', 2, '10-25-2021'),
(3214, 'closed', 532, '10-29-2021'),
(654, 'hold', 3, '11-14-2021');
The above query do the job :
WITH
T AS
(
SELECT Responseid, linkid, linkid AS link_comment
FROM T_RESPONSES
WHERE linkid NOT IN (SELECT Responseid
FROM T_RESPONSES)
UNION ALL
SELECT R.Responseid, R.linkid, link_comment
FROM T_RESPONSES AS R
JOIN T ON T.Responseid = R.linkid
)
SELECT R.Responseid, response, R.linkid, T.link_comment, comment
FROM T
JOIN T_RESPONSES AS R ON T.Responseid = R.Responseid
JOIN T_COMMENTS AS C ON T.link_comment = C.commenid;
I have a table named "customer" that looks like this:
ID ALPHA BRAVO CHARLIE DATE
-------------------------------------------------
1 111 222 333 02/02/2019
2 333 444 555 11/11/2019
3 666 555 777 12/12/2019
4 777 888 999 05/05/2020
5 100 101 110 12/25/2020
and I need to get the following output:
ID ALPHA BRAVO CHARLIE DATE NEW_COL ROW_NUM
-----------------------------------------------------------------------
1 111 222 333 02/02/2019 333 4
2 333 444 555 11/11/2019 333 3
3 666 555 777 12/12/2019 333 2
4 777 888 999 05/05/2020 333 1
5 100 101 110 12/25/2020 010 1
The ALPHA, BRAVO, and CHARLIE columns represent customer IDs. A given customer can have multiple IDs in the system. Records 1-4 represent IDs belonging to the same customer, let's say John. As per the table, John has 12 IDs, and his latest ID is 999. Record 5 represents another customer, let's say Jane. Jane has three IDs, and her last ID is 110.
The purpose of the ROW_NUM column is to get the last CUSTOMER.CHARLIE value. The idea is to use the first CHARLIE value as the partition. Basically, the goal is to get one parent:many children mapping. In this case, the ID 333 should be tied to 555, 777, and 999.
Here is the DDL/DML:
CREATE TABLE CUSTOMER
(ID NUMBER(20) NOT NULL,
ALPHA NUMBER(20) NOT NULL,
BRAVO NUMBER(20) NOT NULL,
CHARLIE NUMBER(20) NOT NULL,
CREATEDDATE DATE
);
INSERT INTO CUSTOMER
VALUES
(1, 111, 222, 333, to_date('02-FEB-19','DD-MON-RR'));
INSERT INTO CUSTOMER
VALUES
(2, 333, 444, 555, to_date('11-NOV-19','DD-MON-RR'));
INSERT INTO CUSTOMER
VALUES
(3, 666, 555, 777, to_date('12-DEC-19','DD-MON-RR'));
INSERT INTO CUSTOMER
VALUES
(4, 777, 888, 999, to_date('05-MAY-20','DD-MON-RR'));
INSERT INTO CUSTOMER
VALUES
(5, 100, 101, 110, to_date('25-DEC-20','DD-MON-RR'));
COMMIT;
I have tried the following query, but it fails to populate the partition column correctly:
WITH
charlies
AS
(SELECT DISTINCT charlie
FROM customer),
mult_customers
AS
(SELECT c.*, c.charlie AS NEW_COL
FROM customer c
UNION
SELECT c.*,
CASE WHEN c.alpha = e.charlie THEN c.alpha ELSE c.bravo END AS NEW_COL
FROM customer c
JOIN charlies e ON e.charlie = c.alpha OR e.charlie = c.bravo),
ranked
AS
(SELECT mc.*,
ROW_NUMBER ()
OVER (PARTITION BY NEW_COL ORDER BY createddate DESC) AS row_num
FROM mult_customers mc)
SELECT *
FROM ranked
ORDER BY ID;
Thanks for any help provided.
You task is known as connected components. I wrote about 7-8 years ago solution for this and even pl/sql package: http://orasql.org/2017/09/29/connected-components/
This PL/SQL solution is much more effective then pure SQL solutions: http://orasql.org/2014/02/28/straight-sql-vs-sql-and-plsql/
Let me know if you need help with adopting it for your task.
With the following schema and table, showing example output of the table, I'm trying to "PIVOT" this data to have rows consisting of the SERIAL and AREA, with each column being the COMPLIANCE for each "NIGHTOF" record. Ideal output shown below as well. I can not wrap my head around the PIVOT syntax no matter how many times I read examples...
Schema and table:
SELECT [CONTID], [AREA], [NIGHTOF], [COMPLIANCE] FROM ComplianceScores
Regular Output:
CONTID NIGHTOF AREA COMPLIANCE
001 2014-01-01 Room 2 28
001 2014-01-01 Room 2 18
001 2014-01-01 Room 2 20
003 2014-01-02 Room 1 18
003 2014-01-02 Room 1 70
003 2014-01-02 Room 1 80
008 2014-01-03 Room 1 0
009 2014-01-04 Room 1 35
Ideal output:
CONTID AREA 2014-01-01 2014-01-02 2014-01-03
001 Room 2 28 18 20
003 Room 1 18 70 80
I agree with #Sparky that your data and output don't match.
I think your data was meant to be like:
create table ComplianceScores (
CONTID char(3),
NIGHTOF date,
AREA varchar(10),
COMPLIANCE int
)
insert ComplianceScores (CONTID, NIGHTOF, AREA, COMPLIANCE)
values ('001', '2014-01-01', 'Room 2', 28)
, ('001', '2014-01-02', 'Room 2', 18)
, ('001', '2014-01-03', 'Room 2', 20)
, ('003', '2014-01-01', 'Room 1', 18)
, ('003', '2014-01-02', 'Room 1', 70)
, ('003', '2014-01-03', 'Room 1', 80)
And I think you're looking for this SQL:
select ContId, Area, [2014-01-01], [2014-01-02], [2014-01-03]
from (select Area, ContId, NightOf, Compliance
from ComplianceScores) SourceTable
pivot
(
sum(Compliance)
for NightOf in ([2014-01-01], [2014-01-02], [2014-01-03])
) pt;
Hi all sorry about my poorly worded title I am unsure as to how to phrase exactly what I need. But i will try and explain it better below:
I have a dataset that looks like this:
DECLARE #TestDATA TABLE (PERSON_ID int, START_DATE date, END_DATE date,SERVICE_RANK int)
INSERT INTO #TestDATA
VALUES
(123, '2018-01-31', '2018-02-14', 7),
(123, '2018-03-28', '2018-04-11', 4),
(123, '2018-04-12', '2018-04-30', 4),
(123, '2018-05-25', '2018-06-08', 7),
(123, '2018-06-08', '2018-06-15', 7),
(123, '2018-06-19', '2018-06-26', 7),
(123, '2018-06-26', '2018-09-28', 4),
(123, '2018-10-10', '2018-11-07', 7),
(123, '2018-11-27', '2018-12-11', 7),
(123, '2018-12-11', '2018-12-24', 7)
Which shows a date range and "service rank" for each person (there is only one person in this example but there are 10's of thousands in the database)
Where for each person_id and each service_rank I would like to group the date periods to identify how many distinct periods they have had. So in the above example this is what I would be looking for:
PERSON ID, START_DATE, END_DATE, SERVICE_RANK, SERVICE_PERIOD
123 2018-01-31 2018-02-14 7 1
123 2018-03-28 2018-04-11 4 2
123 2018-04-12 2018-04-30 4 2
123 2018-05-25 2018-06-08 7 3
123 2018-06-08 2018-06-15 7 3
123 2018-06-19 2018-06-26 7 3
123 2018-06-26 2018-09-28 4 4
123 2018-10-10 2018-11-07 7 5
123 2018-11-27 2018-12-11 7 5
123 2018-12-11 2018-12-24 7 5
I have tried row_number, rank, dense_rank and even had a go at the dreaded CURSOR FOR but I cannot get anything work as the windowed functions see the service ranks as the same, so for the above example it would see two service ranks when there are actually 5 they just share the same numbering.
Also in the dataset not every person will jump from one service_rank to another and back. They may go from one to another (eg 4 -> 7) and stay there or they may only have one service_rank over multiple rows.
Any ideas??
This is a gaps-and-islands problem. For this purpose, one method is lag() and a cumulative sum:
select t.*,
sum(case when prev_service_rank = service_rank then 0 else 1 end) over (partition by person_id order by start_date) as service_period
from (select t.*,
lag(service_rank) over (partition by person_id order by start_date) as prev_service_rank
from t
) t;
I have in my database (SQL Server 2008 R2) a table like this:
ID......Team...........Name......Age
102 Barcelona Mike 15
103 Barcelona Peter 10
104 Barcelona Jacke 10
105 Barcelona Jonas 10
106 Real Madrid Michael 20
107 Real Madrid Terry 26
108 Chelsea James 26
109 Chelsea Arthur 23
110 Chelsea Spence 22
How can I loop the field 'Team' and know that, there are records like Barcelona, Real Madrid and Chelsea.
After that I want to calculate the sum of the team player of each team.
For Barcelona: -> 10 + 10 + 10 + 15 = 45
For Real Madrid: -> 20 + 26 = 46
For Chelsea: -> 26 + 23 + 22 = 71
Fill each result in a separate variable.
The whole calculation should be done in a stored procedure.
The second thing, if I have a table like this:
ID......Team...........Name......HeaderGoal......FreeKickGoal
104 Barcelona Mike 2 1
105 Barcelona Peter 1 0
106 Real Madrid Michael 0 1
107 Real Madrid Terry 0 1
108 Chelsea James 0 0
109 Chelsea Arthur 2 3
110 Chelsea Spence 4 0
How can I loop the field 'Team' and know that, there are records like Barcelona, Real Madrid and Chelsea.
After that I want to calculate the sum of all Goals of each team with the goal type HeaderGoal and FreeKickGoal.
Example for
-> Barcelona: 2+1+1 = 4
-> Real Madrid: 1+1 = 2
-> Chelsea: 2 + 3 + 4 = 9
Fill each result in a separate variable.
The whole calculation should be done in a stored procedure.
I hope you can help me!
BK_
If I understood your question correctly it looks like what you want are aggregates for each group, something that is easily accomplished with the GROUP BY clause.
For the first query you would use:
SELECT team, SUM(age) AS 'Sum of the team'
FROM table
GROUP BY team
This will give this result:
Team Sum of the team
-------------------- ---------------
Barcelona 45
Chelsea 71
Real Madrid 46
and for the second:
SELECT team, SUM(headergoal + freekickgoal) AS 'Sum of goals'
FROM table
GROUP BY team
which will give this result:
Team Sum of goals
-------------------- ------------
Barcelona 4
Chelsea 9
Real Madrid 2
In your example data you list the desired result for the first part for Chelsea as 45 but I guess that is just a typo as you omitted one of Chelseas rows in the calculation?
As for turning it into a stored procedure I can just tell you that it's easy and refer you to the documentation as I won't do all the work for you...
Edit: added merge intoas a response to a comment:
To insert the result of the second query into an existing table you can use either a simple INSERT statement like this:
INSERT table_with_team_and_goals
SELECT team, SUM(headergoal + freekickgoal)
FROM table
GROUP BY team
or MERGE INTO which might be better if you intend to run the query many times (the target table will then be updated if the team already exist in it):
MERGE INTO table_with_team_and_goals AS target
USING (SELECT Team, SUM(headergoal + freekickgoal) AS goals FROM table_with_goals GROUP BY team) AS source
ON target.team=source.team
WHEN MATCHED THEN
UPDATE SET goals = source.goals
WHEN NOT MATCHED THEN
INSERT (Team, Goals)
VALUES (source.team, source.goals);
SELECT TEAM , Name, COUNT(TEAM) As GoalsPerTeam, COUNT(NAME) As GoalPerPlayer
FROM TABLE
GROUP BY TEAM , Name
This query will give you tolal goals scored by per player and per team .
-- Sum age by team
SELECT Team, SUM(Age) SumAge
FROM
(
SELECT Id, Team, Name, Age FROM
(
VALUES
(102, 'Barcelona' , 'Mike' , 15),
(103, 'Barcelona' , 'Peter' , 10),
(104, 'Barcelona' , 'Jacke' , 10),
(105, 'Barcelona' , 'Jonas' , 10),
(106, 'Real Madrid', 'Michael', 20),
(107, 'Real Madrid', 'Terry' , 26),
(108, 'Chelsea' , 'James' , 26),
(109, 'Chelsea' , 'Arthur' , 23),
(110, 'Chelsea' , 'Spence' , 22)
) AS X(Id, Team, Name, Age)
) X
GROUP BY Team
-- Sum goals by team
SELECT Team, SUM(HeaderGoal + FreeKickGoal) Goals
FROM
(
SELECT Id, Team, Name, HeaderGoal, FreeKickGoal FROM
(
VALUES
(104, 'Barcelona' , 'Mike' , 2, 1),
(105, 'Barcelona' , 'Peter' , 1, 0),
(106, 'Real Madrid', 'Michael', 0, 1),
(107, 'Real Madrid', 'Terry' , 0, 1),
(108, 'Chelsea' , 'James' , 0, 0),
(109, 'Chelsea' , 'Arthur' , 2, 3),
(110, 'Chelsea' , 'Spence' , 4, 0)
) AS X(Id, Team, Name, HeaderGoal, FreeKickGoal)
) X
GROUP BY Team