How can I convert rows to columns in SQL - sql

I've been referencing this question a lot, but my case is a little different so I haven't quite figured it out.
I have a set of data that looks something like this:
--------------------------------------
| Id | Answer| Question | EntryBy
--------------------------------------
| 1 |John |Name? | User1 |
| 2 |2.4 |Raiting? | User1 |
| 3 |ZH1E4A |UserId? | User1 |
| 4 |Paul |Name? | User1 |
| 5 |2.3 |Raiting? | User1 |
| 6 |Ron |Name? | User2 |
| 7 |857685 |UserId? | User2 |
----------------------------
I need to pivot the data so that it's structured like so:
----------------------------------------------------------
| Category | Name? | Raiting? | UserId? | EntryBy |
----------------------------------------------------------
| Category1| John | 2.4 | ZH1E4A | User1 |
| Category1| Paul | 2.3 | NULL | User1 |
| Category1| Ron | NULL | 857685 | User2 |
As you can see, there are multiple "Questions" but they don't always have an answer/value. I know the exact number of questions that may be asked/answered so I'm assuming that may help if I used a CASE expression?
Note: The 'Category' column in the last table is just another value similar to 'EntryBy' in the first. I've attempted the pivot approach in the cited question, but the results I get are not correct. I also tried the CASE statement but it resulted in an error since the Questions are titled the same.

Being 2008, we lose the sum() over function, but we can simulate it via a cross apply to create a Grp indicator.
This also assumes the ID is sequential (risky) and Name? is the Group Key.
Also, check the spelling of RAITING
Also, I have no idea where Category is coming from
Example
Select [Name?] = max(case when Question = 'Name?' then Answer end)
,[Raiting?] = max(case when Question = 'Raiting?' then Answer end)
,[UserId?] = max(case when Question = 'UserId?' then Answer end)
,[EntryBy?] = max([EntryBy])
From (
Select A.*
,B.Grp
From YourTable A
Cross Apply (Select Grp=count(*) from YourTable where Question='Name?' and ID<=A.ID) B
) A
Group By Grp
Returns
Name? Raiting? UserId? EntryBy?
John 2.4 ZH1E4A User1
Paul 2.3 NULL User1
Ron NULL 857685 User2

This only does a single parse of the table (or "Values Table Expression") for this one, compared to John's, which does 2:
WITH VTE AS (
SELECT *
FROM (VALUES
(1,'John ','Name? ','User1'),
(2,'2.4 ','Raiting?','User1'),
(3,'ZH1E4A','UserId? ','User1'),
(4,'Paul ','Name? ','User1'),
(5,'2.3 ','Raiting?','User1'),
(6,'Ron ','Name? ','User2'),
(7,'857685','UserId? ','User2'),
(8,'Steve ','Name? ','User3'),
(9,'2.5 ','Raiting?','User3'),
(10,'Jane ','Name? ','User3'),
(11,'GA18S1','UserId? ','User3'),
(12,'2.3 ','Raiting?','User3'),
(13,'ABH12D','UserId? ','User3')) V(ID, Answer, Question, EntryBy)),
Groups AS(
SELECT *,
ROW_NUMBER() OVER (ORDER BY ID ASC) -
ROW_NUMBER() OVER (PARTITION BY CASE WHEN Question = 'Name?' THEN 0 ELSE 1 END ORDER BY ID ASC) AS Grp
FROM VTE)
SELECT 'Category1' AS Category,
MAX(CASE Question WHEN 'Name?' THEN Answer ELSE NULL END) AS [Name?],
MAX(CASE Question WHEN 'Raiting?' THEN Answer ELSE NULL END) AS [Raiting?],
MAX(CASE Question WHEN 'UserID?' THEN Answer ELSE NULL END) AS [UserID?],
EntryBy
FROM Groups
GROUP BY CASE Grp WHEN 0 THEN Grp + 1 ELSE Grp END,
EntryBy
ORDER BY CASE Grp WHEN 0 THEN Grp + 1 ELSE Grp END;
I also added a few extra values to display what happens if the sequencing goes wrong.
Result set:
Category Name? Raiting? UserID? EntryBy
--------- ------- -------- ------- -------
Category1 John 2.4 ZH1E4A User1
Category1 Paul 2.3 NULL User1
Category1 Ron NULL 857685 User2
Category1 Steve 2.5 NULL User3
Category1 Jane 2.3 GA18S1 User3

Related

Identify Sequence of Events In BigQuery

I needed help with some logic for the following dataset:
ID | POST10 | EVENTS_TIMESTAMP |
1 | picked | 2022.11.06 1:00pm|
1 | profile| 2022.11.06 1:30pm|
1 | front | 2022.11.06 1:35pm|
2 | profile| 2022.11.06 1:00pm|
2 | profile| 2022.11.06 1:30pm|
2 | front | 2022.11.06 1:35pm|
2 | front | 2022.11.06 1:36pm|
3 | picked | 2022.11.06 1:00pm|
3 | front | 2022.11.06 1:30pm|
3 | profile| 2022.11.06 1:35pm|
3 | front | 2022.11.06 1:36pm|
LOGIC SHOULD BE:
FOR A PERSON, FIRST VALUE SHOULD BE "picked", THEN "profile" AND IN BETWEEN THOSE TWO VALUES, "front" did not occur.** It can occur after or before those two(based on timestamp) but not in between.
ANSWER FOR THE DATASET ABOVE WOULD BE:
ID | ANSWER |
1 | SELECTED |
2 | NOT SELECTED|
3 | NOT SELECTED|
I wrote the sql but the greater/less than(<,>) arnt working as expected. It looks at the second part after AND individually. I need it to look inside the same window between picked and profile
(case when
(min(case when (post10) like '%picked%' then EVENTS_TIMESTAMP else null end) over (partition by (ID))
>=
min(case when (post10) like '%profile%' then EVENTS_TIMESTAMP else null end) over (partition by (ID)))
AND
(min(case when (post10) like '%profile%' then EVENTS_TIMESTAMP else null end) over (partition by (ID))
>=
min(case when (post10) like '%front%' then EVENTS_TIMESTAMP else null end) over (partition by (ID)))
then 'SELECTED'
else 'NOT SELECTED' end) as ANSWER
You might consider below
SELECT ID, IF(COUNTIF(flag) > 0, 'SELECT', 'NOT SELECTED') AS ANSWER
FROM (
SELECT *, POST10 = 'picked' AND LEAD(POST10) OVER w = 'profile' AS flag
FROM sample_table
WINDOW w AS (PARTITION BY ID ORDER BY PARSE_DATETIME('%Y.%m.%d %l:%M%p', EVENT_TIMESTAMP))
)
GROUP BY ID;
Query results

Pivoting rows into columns Oracle SQL

I am currently working on a table where the format of the table is something like this (all the columns are of type VARCHAR except the INSERTDATE column which is of type DATE):
INSERTDATE | ID | PROPERTYNAME | PROPERTYVALUE
----------------------------------------------
date1 | 1000 | ItemNumber | 20.1A14
date1 | 1000 | ItemRev | 2
date1 | 1000 | BarCodeNumber | 3854981
date2 | 1001 | ItemNumber | 20.1B24
date2 | 1001 | ItemRev | 1
date2 | 1001 | BarCodeNumber | 3856539
What I want to do is to convert all PROPERTYNAME column values into separate columns with all of their respective PROPERTYVALUE column values into their respective columns, something like this:
INSERTDATE | ID | ItemNumber | ItemRev | BarCodeNumber
-------------------------------------------------------
date1 | 1000 | 20.1A14 | 2 | 3854981
date2 | 1001 | 20.1B24 | 1 | 3856539
I have been trying to solve this problem for days without any result. I looked up everything on Pivot on the internet but none of the examples match my own needs. I am not much familiar with Pivot in SQL so it would really be helpful if anyone can help me figure out how to use it to solve my problem.
If you know the columns you want, you can use conditional aggregation:
select insertdate, id,
max(case when PROPERTYNAME = 'ItemNumber' then propertyvalue end) as ItemNumber,
max(case when PROPERTYNAME = 'ItemRev' then propertyvalue end) as ItemRev,
max(case when PROPERTYNAME = 'BarCodeNumber' then propertyvalue end) as BarCodeNumber
from t
group by insertdate, id;
If you don't know all the properties up-front, then you need to construct the query dynamically as a string and use execute immediate.
The use case is a right candidate for PIVOT and you can use it perfectly here.
Assuming you are using Oracle as your database the query will look like,
select insertdate,id,ItemNumber,ItemRev,BarCodeNumber
from mytable
pivot
(
max(propertyvalue)
for propertyname in ('ItemNumber' as ItemNumber
,'ItemRev' ItemRev
,'BarCodeNumber' BarCodeNumber)
)
order by insertdate;

Select Top User over a list of Pages

I have a table containing records of Users' internet history. The table's structure contains the User_ID, the Page Accessed, and the Date Accessed of the page. For Example:
+==========================================+
|User_ID | Page_Accessed | Date_Accessed |
+==========================================+
|Johh.Doe | Google | 1/1/2015 |
|Johh.Doe | Google | 1/1/2015 |
|Suzy.Lue | Google | 7/11/2015 |
|Suzy.Lue | Wikipedia | 4/23/2015 |
|Babe Ruth| StackOverflow | 9/1/2015 |
+==========================================+
I am currently trying to use a SQL query that uses:
RANK() OVER (PARTITION BY [Page Accessed] ORDER BY Count(DateAcc))
Then I use a PIVOT() by the Various Sites. However after selecting the records WHERE (Num = 1) from the PIVOT() and a GROUP BY [Rank], I'm ending up with resulting query similar to:
+=================================================+
|Rank | Google | Wikipedia | StackOverflow |
+=================================================+
| 1 | John Doe| NULL | NULL |
| 1 | NULL | Suzy Lue | NULL |
| 1 | NULL | NULL | Babe Ruth |
+=================================================+
Instead I need to reformat my output as:
+=================================================+
|Rank | Google | Wikipedia | StackOverflow |
+=================================================+
| 1 | John Doe| Suzy Lue | Babe Ruth |
+=================================================+
My Current Query:
SELECT Rank, Google, Wikipedia, StackOverflow
FROM(
SELECT TOP (100) PERCENT User_ID, Page_Accessed, COUNT(Date_Accessed) AS Views,
RANK() OVER (PARTITION BY Page_Accessed ORDER BY Count(Date_Accessed) DESC) AS Rank
FROM Record_Table
GROUP BY dbo.location_key.subSite, dbo.user_info_list_parse.Name
ORDER BY Views DESC) AS tb
PIVOT (
max(tb.User_ID) FOR
Page_Accessed IN ( Google, Wikipedia, StackOverflow)
) pvt
WHERE (Num = 1)
Are there any creative solutions to obtain this result?
I think you've already found solution but for your information and for others reading this - let me erase noise in this query. There is no need to ORDER BY, no need to apply TOP (100) PERCENT, Views column is redundant. I would simplify this query as follows:
CREATE TABLE InternetHistory
(
[User_ID] varchar(20),
[Page_Accessed] varchar(20),
[Date_Accessed] datetime
)
INSERT InternetHistory VALUES
('Johh.Doe', 'Google', '2015-01-01'),
('Johh.Doe', 'Google', '2015-01-01'),
('Suzy.Lue', 'Google', '2015-07-11'),
('Suzy.Lue', 'Wikipedia', '2015-04-23'),
('Babe Ruth', 'StackOverflow', '2015-01-09')
SELECT * FROM
(
SELECT [User_ID], [Page_Accessed], RANK() OVER (PARTITION BY [Page_Accessed] ORDER BY COUNT(*) DESC) Ranking
FROM InternetHistory
GROUP BY [User_ID], [Page_Accessed]
) AS Src
PIVOT
(
MAX([User_Id]) FOR [Page_Accessed] IN ([Google], [Wikipedia], [StackOverflow])
) AS Pvt
WHERE Ranking = 1

Postgres recursive CTE or crosstab function

I try to generate some user statistics from a table that includes logging information.
**TABLE users**
user_id | user_name
-------------------
1 | julia
2 | bob
3 | sebastian
**TABLE logs**
user_id | action | timepoint
------------------------------------
1 | create_quote | 2015-01-01
1 | send_quote | 2015-02-03
1 | create_quote | 2015-02-02
1 | start_job | 2015-01-15
2 | start_job | 2015-02-23
2 | send_quote | 2015-03-04
2 | start_job | 2014-12-02
My desired output is the following table
user_id | username | create_quote | send_quote | start_job
-----------------------------------------------------------
1 | julia |2 | 1 | 1
2 | bob |0 | 1 | 1
3 | sebastian |0 | 0 | 0
It includes all users (even if there was nothing logged), but only the actions between date '2015-01-01' and '2015-05-31'. Actions are counted/grouped by action type and user.
The SQL statement could look someting like
SELECT * FROM myfunction() WHERE to_char(timepoint, 'YY/MM') BETWEEN '15/01' AND '15/05';
Do you have any idea how to manage this? I've been trying around with CTEs and recursion as well as with the crosstab function but could not find any solution.
I think the crosstab function would be a lot more elegant, but in the case that you don't have the extension loaded or, like me, struggle with the syntax, this is a kind of clumsy, brute-force way you could do it:
CREATE OR REPLACE FUNCTION get_stats(
from_date date,
thru_date date)
RETURNS table (
user_id integer,
username text,
create_quote bigint,
send_quote bigint,
start_job bigint
) AS
$BODY$
select
l.user_id, u.username,
sum (case when action = 'create_quote' then 1 else 0 end) as create_quote,
sum (case when action = 'send_quote' then 1 else 0 end) as send_quote,
sum (case when action = 'start_job' then 1 else 0 end) as start_job
from
logs l
join users u on l.user_id = u.user_id
where
l.timepoint between from_date and thru_date
group by
l.user_id, u.username
$BODY$
LANGUAGE sql VOLATILE
COST 100
ROWS 1000;
And then your query would be:
select * from get_stats('2015-01-01', '2015-05-31')
Personally, I would skip the function and just create it as a query, but it's conceivable there are reasons where you would want the function wrapper.
-- EDIT --
Based on an attempted edit, I see you may be okay with a query. Also, you wanted users that have no entries.
With all of that in mind, I think this might work:
select
u.user_id, u.username,
sum (case when action = 'create_quote' then 1 else 0 end) as create_quote,
sum (case when action = 'send_quote' then 1 else 0 end) as send_quote,
sum (case when action = 'start_job' then 1 else 0 end) as start_job
from
users u
left join logs l on
l.user_id = u.user_id and
l.timepoint between '2015-01-01' and '2015-05-31'
group by
u.user_id, u.username

SQL query filtering

Using SQL Server 2005, I have a table where certain events are being logged, and I need to create a query that returns only very specific results. There's an example below:
Log:
Log_ID | FB_ID | Date | Log_Name | Log_Type
7 | 4 | 2007/11/8 | Nina | Critical
6 | 4 | 2007/11/6 | John | Critical
5 | 4 | 2007/11/6 | Mike | Critical
4 | 4 | 2007/11/6 | Mike | Critical
3 | 3 | 2007/11/3 | Ben | Critical
2 | 3 | 2007/11/1 | Ben | Critical
The query should do the following: return ONLY one row per each FB_ID, but this needs to be the one where Log_Name has changed for the first time, or if the name never changes, then the first dated row.
In layman's terms I need this to browse through a DB to check for each instance where the responsibility of a case (FB_ID) has been moved to another person, and in case it never has, then just get the original logger's name.
In the example above, I should get rows (Log_ID) 2 and 6.
Is this even possible? Right now there's a discussion going on whether the DB was just made the wrong way. :)
I imagine I need to somehow be able to store the first resulting Log_Name into a variable and then compare it with an IF condition etc. I have no idea how to do such a thing with SQL though.
Edit: Updated the date. And to clarify on this, the correct result would look like this:
Log_ID | FB_ID | Date | Log_Name | Log_Type
6 | 4 | 2007/11/6 | John | Critical
2 | 3 | 2007/11/1 | Ben | Critical
It's not the first date per FB_ID I'm after, but the row where the Log_Name is changed from the original.
Originally FB_ID 4 belongs to Mike, but the query should return the row where it moves on to John. However, it should NOT return the row where it moves further on to Nina, because the first responsibility change already happened when John got it.
In the case of Ben with FB_ID 3, the logger is never changed, so the first row for Ben should be returned.
I guess that there is a better and more performant way, but this one seems to work:
SELECT *
FROM log
WHERE log_id IN
( SELECT MIN(log_id)
FROM log
WHERE
( SELECT COUNT(DISTINCT log_name)
FROM log log2
WHERE log2.fb_id = log.fb_id ) = 1
OR log.log_name <> ( SELECT log_name
FROM log log_3
WHERE log_3.log_id =
( SELECT MIN(log_id)
FROM log log4
WHERE log4.fb_id = log.fb_id ) )
GROUP BY fb_id )
This will efficiently use an index on (fb_id, cdate, id):
SELECT lo4.*
FROM
(
SELECT CASE WHEN ln.log_id IS NULL THEN lo2.log_id ELSE ln.log_id END AS log_id,
ROW_NUMBER() OVER (PARTITION BY lo2.fb_id ORDER BY lo2.cdate) AS rn
FROM (
SELECT
lo.*,
(
SELECT TOP 1 log_id
FROM t_log li
WHERE li.fb_id = lo.fb_id
AND li.cdate >= lo.cdate
AND li.log_id <> lo.log_id
AND li.log_name <> lo.log_name
ORDER BY
cdate, log_id
) AS next_id
FROM t_log lo
) lo2
LEFT OUTER JOIN
t_log ln
ON ln.log_id = lo2.next_id
) lo3, t_log lo4
WHERE lo3.rn = 1
AND lo4.log_id = lo3.log_id
If I've understood the problem correctly, the following SQL should do the trick:
SELECT Log_ID, FB_ID, min(Date), Log_Name, Log_Type
FROM Log
GROUP BY Date
The SQL will select the row with the earliest date for each FP_ID.