Convert 2 rows with multiple columns into 2 columns with multiple rows - sql

I often run ad-hoc queries in SQL Server 2005/2008 where I would like to convert two rows in multiple columns into multiple rows having only two columns.
Given a query like this:
SELECT
SUM(CASE WHEN created_at IS NOT NULL THEN 1 END) AS 'TOTAL'
, SUM(CASE WHEN created_at > '2013-07-15' THEN 1 END) AS 'CREATED W/I LAST YEAR'
, SUM(CASE WHEN updated_at > '2013-07-15' THEN 1 END) AS 'MODIFIED W/I LAST YEAR'
, SUM(CASE WHEN updated_at < '2011-07-15' THEN 1 END) AS 'UNTOUCHED OVER 3 YEARS'
, SUM(CASE WHEN updated_at < '2009-07-15' THEN 1 END) AS 'UNTOUCHED OVER 5 YEARS'
-- , often there are more columns
FROM
mytable
WHERE
< filtering >
I would like it to display something like this:
TOTAL: 5000
CREATED W/I LAST YEAR: 500
MODIFIED W/I LAST YEAR: 1500
UNTOUCHED OVER 3 YEARS: 2000
UNTOUCHED OVER 5 YEARS: 1000
I want to keep DRY and not string together a bunch of SELECTs with UNIONs. I have never used PIVOT, UNPIVOT or CROSS APPLY. Most of the examples I have seen for UNPIVOT don't seem to apply to queries like the one above - or am I must missing something? It seems simple enough but "I'm just not getting it."

;WITH t AS (
SELECT
SUM(CASE WHEN created_at IS NOT NULL THEN 1 END) AS 'TOTAL'
, SUM(CASE WHEN created_at > '2013-07-15' THEN 1 END) AS 'CREATED W/I LAST YEAR'
, SUM(CASE WHEN updated_at > '2013-07-15' THEN 1 END) AS 'MODIFIED W/I LAST YEAR'
, SUM(CASE WHEN updated_at < '2011-07-15' THEN 1 END) AS 'UNTOUCHED OVER 3 YEARS'
, SUM(CASE WHEN updated_at < '2009-07-15' THEN 1 END) AS 'UNTOUCHED OVER 5 YEARS'
-- , often there are more columns
FROM
mytable
WHERE
< filtering >
)
SELECT name, value
FROM t
UNPIVOT(value FOR name IN (
[TOTAL]
, [CREATED W/I LAST YEAR]
, [MODIFIED W/I LAST YEAR]
, [UNTOUCHED OVER 3 YEARS]
, [UNTOUCHED OVER 5 YEARS]
)) p

Related

SQL Efficiency on Date Range or Separate Tables

I'm calculating historical amount from a table in years(ex. 2015-2016, 2014-2015, etc.) I would like to seek expertise if its more efficient to do it in one batch or repeat the query multiple times filtered by the date required.
Thanks in advance!
OPTION 1:
select
id,
sum(case when year(getdate()) - year(txndate) between 5 and 6 then amt else 0 end) as amt_6_5,
...
sum(case when year(getdate()) - year(txndate) between 0 and 1 then amt else 0 end) as amt_1_0,
from
mytable
group by
id
OPTION 2:
select
id, sum(amt) as amt_6_5
from
mytable
group by
id
where
year(getdate()) - year(txndate) between 5 and 6
...
select
id, sum(amt) as amt_1_0
from
mytable
group by
id
where
year(getdate()) - year(txndate) between 0 and 1
1.
Unless you have resources issues I would go with the CASE version.
Although it has no impact on the results, filtering on the requested period in the WHERE clause might have a significant performance advantage.
2. Your period definition creates overlapping.
select id
,sum(case when year(getdate()) - year(txndate) = 6 then amt else 0 end) as amt_6
-- ...
,sum(case when year(getdate()) - year(txndate) = 0 then amt else 0 end) as amt_0
where txndate >= dateadd(year, datediff(year,0, getDate())-6, 0)
from mytable
group by id
This may be help you,
WITH CTE
AS
(
SELECT id,
(CASE WHEN year(getdate()) - year(txndate) BETWEEN 5 AND 6 THEN 'year_5-6'
WHEN year(getdate()) - year(txndate) BETWEEN 4 AND 5 THEN 'year_4-5'
...
END) AS my_year,
amt
FROM mytable
)
SELECT id,my_year,sum(amt)
FROM CTE
GROUP BY id,my_year
Here, inside the CTE, just assigned a proper year_tag for each records (based on your conditions), after that select a summary for the CTE grouped by that year_tag.

Limit SQL query to days

I use this SQL query to make status report by day:
CREATE TABLE TICKET(
ID INTEGER NOT NULL,
TITLE TEXT,
STATUS INTEGER,
LAST_UPDATED DATE,
CREATED DATE
)
;
Query:
SELECT t.created,
COUNT(CASE WHEN t.status = '1' THEN 1 END) as cnt_status1,
COUNT(CASE WHEN t.status = '2' THEN 1 END) as cnt_status2,
COUNT(CASE WHEN t.status = '3' THEN 1 END) as cnt_status3,
COUNT(CASE WHEN t.status = '4' THEN 1 END) as cnt_status4
FROM ticket t
GROUP BY t.created
How I can limit this query to last 7 days?
Also I would like to get the results split by day. Fow example I would like to group the first dates for 24 hours, second for next 24 hours and etc.
Expected result:
This might help:
SELECT TO_CHAR(t.created, 'YYYY-MM-DD') AS created_date,
COUNT(CASE WHEN t.status = '1' THEN 1 END) as cnt_status1,
COUNT(CASE WHEN t.status = '2' THEN 1 END) as cnt_status2,
COUNT(CASE WHEN t.status = '3' THEN 1 END) as cnt_status3,
COUNT(CASE WHEN t.status = '4' THEN 1 END) as cnt_status4
FROM ticket t
WHERE t.created >= SYSDATE-7
GROUP BY TO_CHAR(t.created, 'YYYY-MM-DD')
ORDER BY created_date;
I used the oracle function for date conversion. I'm sure you'll find the corresponding one for postgresql.

Funnel query with Amazon Redshift / PostgreSQL

I'm trying to analyze a funnel using event data in Redshift and have difficulties finding an efficient query to extract that data.
For example, in Redshift I have:
timestamp action user id
--------- ------ -------
2015-05-05 12:00 homepage 1
2015-05-05 12:01 product page 1
2015-05-05 12:02 homepage 2
2015-05-05 12:03 checkout 1
I would like to extract the funnel statistics. For example:
homepage_count product_page_count checkout_count
-------------- ------------------ --------------
100 50 25
Where homepage_count represent the distinct number of users who visited the homepage, product_page_count represents the distinct numbers of users who visited the homepage after visiting the homepage, and checkout_count represents the number of users who checked out after visiting the homepage and the product page.
What would be the best query to achieve that with Amazon Redshift? Is it possible to do with a single query?
I think the best method might be to add flags to the data for the first visit of each type for each user and then use these for aggregation logic:
select sum(case when ts_homepage is not null then 1 else 0 end) as homepage_count,
sum(case when ts_productpage > ts_homepage then 1 else 0 end) as productpage_count,
sum(case when ts_checkout > ts.productpage and ts.productpage > ts.homepage then 1 else 0 end) as checkout_count
from (select userid,
min(case when action = 'homepage' then timestamp end) as ts_homepage,
min(case when action = 'product page' then timestamp end) as ts_productpage,
min(case when action = 'checkout' then timestamp end) as ts_checkout
from table t
group by userid
) t
The above answer is very much correct . I have modified it for people using it for AWS Mobile Analytics and Redshift.
select sum(case when ts_homepage is not null then 1 else 0 end) as homepage_count,
sum(case when ts_productpage > ts_homepage then 1 else 0 end) as productpage_count,
sum(case when ts_checkout > ts_productpage and ts_productpage > ts_homepage then 1 else 0 end) as checkout_count
from (select client_id,
min(case when event_type = 'App Launch' then event_timestamp end) as ts_homepage,
min(case when event_type = 'SignUp Success' then event_timestamp end) as ts_productpage,
min(case when event_type = 'Start Quiz' then event_timestamp end) as ts_checkout
from awsma.v_event
group by client_id
) ts;
Just in case more precise model required: when product page can be opened twice. First time before home page and second one after. This case usually should be considered as conversion as well.
Redshift SQL query:
SELECT
COUNT(
DISTINCT CASE WHEN cur_homepage_time IS NOT NULL
THEN user_id END
) Step1,
COUNT(
DISTINCT CASE WHEN cur_homepage_time IS NOT NULL AND cur_productpage_time IS NOT NULL
THEN user_id END
) Step2,
COUNT(
DISTINCT CASE WHEN
cur_homepage_time IS NOT NULL AND cur_productpage_time IS NOT NULL AND cur_checkout_time IS NOT NULL
THEN user_id END
) Step3
FROM (
SELECT
user_id,
timestamp,
COALESCE(homepage_time,
LAG(homepage_time) IGNORE NULLS OVER(PARTITION BY user_id
ORDER BY time)
) cur_homepage_time,
COALESCE(productpage_time,
LAG(productpage_time) IGNORE NULLS OVER(PARTITION BY distinct_id
ORDER BY time)
) cur_productpage_time,
COALESCE(checkout_time,
LAG(checkout_time) IGNORE NULLS OVER(PARTITION BY distinct_id
ORDER BY time)
) cur_checkout_time
FROM
(
SELECT
timestamp,
user_id,
(CASE WHEN event = 'homepage'
THEN timestamp END) homepage_time,
(CASE WHEN event = 'product page'
THEN timestamp END) productpage_time,
(CASE WHEN event = 'checkout'
THEN timestamp END) checkout_time
FROM events
WHERE timestamp > '2016-05-01' AND timestamp < '2017-01-01'
ORDER BY user_id, timestamp
) event_times
ORDER BY user_id, timestamp
) event_windows
This query fills each row's cur_homepage_time, cur_productpage_time and cur_checkout_time with recent timestamp of event occurrences. So in case for some specific time (read row) event occured then particular column is not NULL.
More info here.

Week interval query starting on mondays

FIDDLE
I need to do a JasperReport. what I need to display is the total number of accounts processes, broken down into weekly intervals with the number of activated and declined accounts.
For the weekly interval query I got thus far:
SELECT *
FROM account_details
WHERE DATE date_opened = DATE_ADD(2014-01-01, INTERVAL(1-DAYOFWEEK(2014-01-01)) +1 DAY)
This seems to be correct, but not POSTGRES correct. It keeps complaining about the 1-DAYOFWEEK. Here is what I will hopefully achieve:
UPDATE
It is pretty ugly, but I dont know of any better. Id does the job though. But dont know if it can be re-factored to look better at least. I also dont know how to handle division by zero at the moment.
SELECT to_char(d.day, 'YYYY/MM/DD - ') || to_char(d.day + 6, 'YYYY/MM/DD') AS Month
, SUM(CASE WHEN LOWER(situation) LIKE '%active%' THEN 1 ELSE 0 END) AS Activated
, SUM(CASE WHEN LOWER(situation) LIKE '%declined%' THEN 1 ELSE 0 END) AS Declined
, SUM(CASE WHEN LOWER(situation) LIKE '%declined%' OR LOWER(situation) LIKE '%active%' THEN 1 ELSE 0 END) AS Total
, to_char( 100.0 *( (SUM(CASE WHEN LOWER(situation) LIKE '%active%' THEN 1 ELSE 0 END)) / (SUM(CASE WHEN LOWER(situation) LIKE '%declined%' OR LOWER(situation) LIKE '%active%' THEN 1 ELSE 0 END))::real) , '99.9') AS percent_activated
, to_char( 100.0 *( (SUM(CASE WHEN LOWER(situation) LIKE '%declined%' THEN 1 ELSE 0 END)) / (SUM(CASE WHEN LOWER(situation) LIKE '%declined%' OR LOWER(situation) LIKE '%active%' THEN 1 ELSE 0 END))::real) , '99.9') AS percent_declined
FROM (
SELECT day::date
FROM generate_series('2014-08-01'::date, '2014-09-14'::date, interval '1 week') day
) d
JOIN account_details a ON a.date_opened >= d.day
AND a.date_opened < d.day + 6
GROUP BY d.day;
SELECT to_char(d.day, 'YYYY/MM/DD" - "')
|| to_char(d.day + 6, 'YYYY/MM/DD') AS week
, count(situation ILIKE '%active%' OR NULL) AS activated
, ...
FROM (
SELECT day::date
FROM generate_series('2014-08-11'::date
, '2014-09-14'::date
, '1 week'::interval) day
) d
LEFT JOIN account_details a ON a.date_opened >= d.day
AND a.date_opened < d.day + 7 -- 7, not 6!
GROUP BY d.day;
Related answers:
Weekly total sums
Calculate working hours between 2 dates in PostgreSQL
Best way to count records by arbitrary time intervals in Rails+Postgres
More about counting specific values:
For absolute performance, is SUM faster or COUNT?
SQL Query to Transpose Column Counts to Row Counts
Aside: You would typically use an enum or a look-up table and just store an ID for situation, not a lengthy text redundantly.

SQL get number of entries satisfying different conditions

the goal is to retrieve the number of users in one table which have:
field EXPIREDATE > CURRENT_TIMESTAMP as nUsersActive
field EXPIREDATE < CURRENT_TIMESTAMP as nUsersExpired
field EXPIREDATE IS NULL as nUsersPreregistered
all with one query, and the result should for example be
nUsersActive nUsersExpired nUsersPreregistered
10 2 15
this will later be json_encoded and passed to an ExtJS script for displaying.
Any hint? I tried several times without succeding. I tried with the UNION statement, I get the right numbers, but of course in column, while I need them in row.
Thanks for your support.
Something like the following should work, you may need to adjust for the specific database that you are using.
To get them in columns:
select
count(case when EXPIREDATE > CURRENT_TIMESTAMP then 1 end) AS nUsersActive,
count(case when EXPIREDATE < CURRENT_TIMESTAMP then 1 end) AS nUsersExpired,
count(case when EXPIREDATE IS NULL then 1 end) AS nUserPreregistered
from users_table
And in rows (this is not as efficient!):
select
'nUsersActive' AS Param
count(case when EXPIREDATE > CURRENT_TIMESTAMP then 1 end) AS Value
from users_table
UNION ALL
select 'nUsersExpired',
count(case when EXPIREDATE < CURRENT_TIMESTAMP then 1 end)
from users_table
UNION ALL
select 'nUserPreregistered',
count(case when EXPIREDATE IS NULL then 1 end)
from users_table
I'm assuming you are using SQL Server. You should be able to get what you're looking for by using a CASE statement. Make sure you return something (anything) if the condition is true and NULL if the condition is false. Here is the msdn documentation: http://msdn.microsoft.com/en-us/library/ms181765.aspx
Your query would look something like this:
select COUNT(CASE WHEN #ThingToCheck = 'Value' THEN 1 ELSE NULL END) as Count1, COUNT(CASE WHEN #ThingToCheck = 'Value' THEN 1 ELSE NULL END) FROM ....
SELECT COUNT(CASE WHEN EXPIREDATE > CURRENT_TIMESTAMP THEN 1 END) AS nUsersActive,
COUNT(CASE WHEN EXPIREDATE < CURRENT_TIMESTAMP THEN 1 END) AS nUsersExpired,
COUNT(CASE WHEN EXPIREDATE IS NULL THEN 1 END) AS nUsersPreregistered
FROM Users