Having trouble with a query in SQL Server - sql

Let's say I've got a SQL Server table that logs user activity. Let's say it has user ID, user name, activity date, and activity type columns. I want to print out a list of all user activity, with one row for each month of activity, and a column for each activity type summing up the number of times that activity occurred in that month. I'm trying to do this with the following query:
SELECT
user_id,
user_name,
CONVERT(VARCHAR(7), activity_date, 120),
SUM(CASE WHEN activity_type = 'Log In' THEN 1 ELSE 0 END),
SUM(CASE WHEN activity_type = 'Save Document' THEN 1 ELSE 0 END),
SUM(CASE WHEN activity_type = 'Create Document' THEN 1 ELSE 0 END)
FROM UserActivity
WHERE DATE BETWEEN '11-1-2010 00:00:00' AND '12-31-2010 23:59:59'
GROUP BY user_id, user_name, CONVERT(VARCHAR(7), activity_date, 120)
The problem is, this query is essentially giving me a separate row for each activity--lots and lots of rows, no counting. I think that the problem is with the way I'm doing the dates, because if I change the query to not select the date, I get a table that looks "mostly correct."
Any thoughts?

You can't have a SUM without a GROUP BY, at least not with other non-aggregates in the SELECT. Do your GROUP BY clause properly.
SELECT
user_id,
user_name,
CONVERT(VARCHAR(7), activity_date, 120),
SUM(CASE WHEN activity_type = 'Log In' THEN 1 ELSE 0 END),
SUM(CASE WHEN activity_type = 'Save Document' THEN 1 ELSE 0 END),
SUM(CASE WHEN activity_type = 'Create Document' THEN 1 ELSE 0 END)
FROM UserActivity
WHERE DATE BETWEEN '11-1-2010 00:00:00' AND '12-31-2010 23:59:59'
GROUP BY user_id,
user_name,
CONVERT(VARCHAR(7), activity_date, 120)
For what it's worth, for date ranges, I prefer to use
WHERE DATE >= '20101101'
AND DATE < '20110101'
I'm sure losing a few records with a timestamp of '12-31-2010 23:59:59.997' won't matter, but it's just more logically correct to use a < next_date test. And to be pedantic, the format YYYYMMDD is the most robust regardless of regional/language/dateformat settings.

Related

Find difference between two rows in sql

I have table that stores the employe info in multiple rows and it having the common name for it along with its user login time and log out time for website, and would like to achieve the result and it may contains multiple names such as (N1,N2,N3..etc)
Name,Key,Time,
N1,TotalExp,No
N1,TotalYears,5
N1,LoggedIn,10:00:00
N1,LoggedOut,20:00:00
Expected Output will like below,
N1,TotalExp,TotalYrs,LoggedDifference
N1,No,5,10
Any one help me to achieve this
Even it's a fact that the design of your database doesn't look well, you can query your data this way:
with your_data as (
select 'N1' as Name,'TotalExp' as [Key],'No' as Time union all
select 'N1','TotalYears','5' union all
select 'N1','LoggedIn','10:00:00' union all
select 'N1','LoggedOut','20:00:00'
)
select
Name,
max(case when [Key] = 'TotalExp' then Time else null end) as TotalExp,
max(case when [Key] = 'TotalYears' then Time else null end) as TotalYrs,
datediff(
hour,
max(case when [Key] = 'LoggedIn' then convert(time, Time) else null end),
max(case when [Key] = 'LoggedOut' then convert(time, Time) else null end)
) as LoggedDifference
from your_data
group by Name
You can test on here

Sql out put which come into multiple row converted to single row

This query give multiple row which needs to be shown in single row. Please help.
SELECT blng_serv_code, (COUNT (blng_serv_code)) AS total ,
DECODE (package_trx_yn, 'Y', 'PKG', 'N', 'NPKG') pkg_status FROM bl_patient_charges_folio
WHERE operating_facility_id = 'MC'
AND trx_date >= TO_DATE ('10/10/2019 00:00:00', 'MM/DD/YYYY HH24:MI:SS')AND blng_serv_code = 'LBSB000015'
GROUP BY blng_serv_code, package_trx_yn
If you want the value in a single row, leave out the package status:
SELECT blng_serv_code, COUNT(*) AS total
FROM bl_patient_charges_folio
WHERE operating_facility_id = 'MC' AND
trx_date >= DATE '2019-10-10' AND
blng_serv_code = 'LBSB000015'
GROUP BY blng_serv_code;
If you do want the package status, then you need to explain the logic for including it "on a single row".
EDIT:
It sounds like you want the values in separate columns:
SELECT blng_serv_code, COUNT(*) AS total,
SUM(CASE WHEN package_trx_yn = 'Y' THEN 1 ELSE 0 END) as pkg_cnt,
SUM(CASE WHEN package_trx_yn = 'N' THEN 1 ELSE 0 END) as npkg_cnt
FROM bl_patient_charges_folio
WHERE operating_facility_id = 'MC' AND
trx_date >= DATE '2019-10-10' AND
blng_serv_code = 'LBSB000015'
GROUP BY blng_serv_code;

MSSQL Group by and Select rows from grouping

I'm trying to figure out if what I'm trying to do is possible. Instead of resorting to multiple queries on a table, I wanted to group the records by business date and id then group by the id and select one date for a field and another date for the other field.
SELECT
*
{AMOUNT FROM DATE}
{AMOUNT FROM OTHER DATE}
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
AS subquery
GROUP BY id
It seems that you're looking to do a pivot query. I usually use cross tabs for this. Based on the query you posted, it could look like:
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)AS subquery
GROUP BY id;
You could also use a CTE.
WITH CTE AS(
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
Or even be a rebel and do the operation directly.
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
However, some people have tested for performance and found that pre-aggregating can improve performance.
If I understand you correctly, then you're just trying to pivot, but only with two particular dates:
select id,
date1 = sum(iif(date = '2000-01-01', amount, null)),
date2 = sum(iif(date = '2000-01-02', amount, null))
from [table]
group by id

sum of new columns within a query

I have the below query:
SELECT distinct COUNT(Status) AS [Transactions], sending_organisation AS [Supplier],
DATENAME(mm, Date_Reported) AS Month, DATENAME(yyyy, Date_Reported) AS Year,
Sum(Case When Status = 'Defect' Then 1 Else 0 End) As Defect,
Sum(Case When Status = 'Failed' Then 1 Else 0 End) As Failed,
Sum(Case When Status = 'Success' Then 1 Else 0 End) As Success,
FROM [Tx]
where Channel_Partner = 'CAT'
and DATENAME(yyyy, Date_Reported) = '2018'
and DATENAME(mm, Date_Reported) = 'March'
GROUP BY DATENAME(mm, Date_Reported), DATENAME(yyyy, Date_Reported), sending_organisation
ORDER BY sending_organisation ASC
As you can see I have created the new columns of defect, success, failed.
I wanted to add another column within this same query in finding an aggregated sum of failed + success?
Any ideas how this could done without creating a table or doing more than one query?
thanks in advance
Just add a further column with another COUNT(<expr>):
SUM(CASE WHEN [Status] IN ('Success','Failed') THEN 1 ELSE 0 END) AS SuccessFailed
Further to the OP's comments, (on a totally unrelated matter), to return the data for the current month I would use:
AND Date_Reported >= DATEADD(MONTH, DATEDIFF(MONTH,0,GETDATE()),0)
AND Date_Reportded < DATEADD(MONTH, DATEDIFF(MONTH,0,GETDATE()) + 1,0)
You should be able to alter this slighly for your own needs, if needed. Using something like DATENAME on your column isn't a good idea; it makes the query non-SARGable.

Funnel query with Amazon Redshift / PostgreSQL

I'm trying to analyze a funnel using event data in Redshift and have difficulties finding an efficient query to extract that data.
For example, in Redshift I have:
timestamp action user id
--------- ------ -------
2015-05-05 12:00 homepage 1
2015-05-05 12:01 product page 1
2015-05-05 12:02 homepage 2
2015-05-05 12:03 checkout 1
I would like to extract the funnel statistics. For example:
homepage_count product_page_count checkout_count
-------------- ------------------ --------------
100 50 25
Where homepage_count represent the distinct number of users who visited the homepage, product_page_count represents the distinct numbers of users who visited the homepage after visiting the homepage, and checkout_count represents the number of users who checked out after visiting the homepage and the product page.
What would be the best query to achieve that with Amazon Redshift? Is it possible to do with a single query?
I think the best method might be to add flags to the data for the first visit of each type for each user and then use these for aggregation logic:
select sum(case when ts_homepage is not null then 1 else 0 end) as homepage_count,
sum(case when ts_productpage > ts_homepage then 1 else 0 end) as productpage_count,
sum(case when ts_checkout > ts.productpage and ts.productpage > ts.homepage then 1 else 0 end) as checkout_count
from (select userid,
min(case when action = 'homepage' then timestamp end) as ts_homepage,
min(case when action = 'product page' then timestamp end) as ts_productpage,
min(case when action = 'checkout' then timestamp end) as ts_checkout
from table t
group by userid
) t
The above answer is very much correct . I have modified it for people using it for AWS Mobile Analytics and Redshift.
select sum(case when ts_homepage is not null then 1 else 0 end) as homepage_count,
sum(case when ts_productpage > ts_homepage then 1 else 0 end) as productpage_count,
sum(case when ts_checkout > ts_productpage and ts_productpage > ts_homepage then 1 else 0 end) as checkout_count
from (select client_id,
min(case when event_type = 'App Launch' then event_timestamp end) as ts_homepage,
min(case when event_type = 'SignUp Success' then event_timestamp end) as ts_productpage,
min(case when event_type = 'Start Quiz' then event_timestamp end) as ts_checkout
from awsma.v_event
group by client_id
) ts;
Just in case more precise model required: when product page can be opened twice. First time before home page and second one after. This case usually should be considered as conversion as well.
Redshift SQL query:
SELECT
COUNT(
DISTINCT CASE WHEN cur_homepage_time IS NOT NULL
THEN user_id END
) Step1,
COUNT(
DISTINCT CASE WHEN cur_homepage_time IS NOT NULL AND cur_productpage_time IS NOT NULL
THEN user_id END
) Step2,
COUNT(
DISTINCT CASE WHEN
cur_homepage_time IS NOT NULL AND cur_productpage_time IS NOT NULL AND cur_checkout_time IS NOT NULL
THEN user_id END
) Step3
FROM (
SELECT
user_id,
timestamp,
COALESCE(homepage_time,
LAG(homepage_time) IGNORE NULLS OVER(PARTITION BY user_id
ORDER BY time)
) cur_homepage_time,
COALESCE(productpage_time,
LAG(productpage_time) IGNORE NULLS OVER(PARTITION BY distinct_id
ORDER BY time)
) cur_productpage_time,
COALESCE(checkout_time,
LAG(checkout_time) IGNORE NULLS OVER(PARTITION BY distinct_id
ORDER BY time)
) cur_checkout_time
FROM
(
SELECT
timestamp,
user_id,
(CASE WHEN event = 'homepage'
THEN timestamp END) homepage_time,
(CASE WHEN event = 'product page'
THEN timestamp END) productpage_time,
(CASE WHEN event = 'checkout'
THEN timestamp END) checkout_time
FROM events
WHERE timestamp > '2016-05-01' AND timestamp < '2017-01-01'
ORDER BY user_id, timestamp
) event_times
ORDER BY user_id, timestamp
) event_windows
This query fills each row's cur_homepage_time, cur_productpage_time and cur_checkout_time with recent timestamp of event occurrences. So in case for some specific time (read row) event occured then particular column is not NULL.
More info here.