Apply condition for a particular group: SQL - sql

Currently I have to provide a report in BigQuery for Google Analytics of a format:
ClientID, SessionID, Source, Medium SessionNumber
So, the outcome has to be the table where for every ClientID, SessionNumber contains ordinal of an event occurred sorted by date
The problem is during one session a conversion occurs. So SessionNumber values for the particular ClientID do not end at the event where conversion occurs.
[For example, there are 30 events in a ClientID = 1, and conversion occurs at 15-th event (ordinal from SessionNumber)]
My question is how to delete all the data in a table after the ordinal in SessionNumber when conversion occurs?
So it would become possible to see when the lastclick occurred and and see the Assisted Conversion?
How to apply a condition for a particular group of data (for every client ID, in this case - REMEMBER that, here, for every ClientID we have a particular number of events and corresponding ordinals)
Example:
Client ID | SessionID | Source | Medium | SesionNumber | Goal Achieved (1 if yes)
1 | 456| google | cpc | 1 | 0
1 | 456| google | cpc | 2 | 0
1 | 456| google | cpc | 3 | 1
1 | 456| google | cpc | 4 | 0
2 | 234| ... ... ... ...
... ... ... ... ... ...
So, I have to get rid of all the rows after the SessionNumber when conversion occurred inside of each ClientID (Conversion occurs after the goal is achieved - I have already created such code). Is there any Syntax for standard SQL which would allow to create such condition for every ClientID ?
The thing is that inside one session conversion may occur and after that user can do other things as well, but I only need records from the start of the session until conversion occurs
I would like to return the table (from the example):
Client ID | SessionID | Source | Medium | SesionNumber | Goal Achieved (1 if yes)
1 | 456| google | cpc | 1 | 0
1 | 456| google | cpc | 2 | 0
1 | 456| google | cpc | 3 | 1
2 | 234| ... ... ... ...
... ... ... ... ... ...

Related

Attempt to find total unique occurrences column as well as return the max value in that column

I am attempting to find the number of occurrences of the "message to user" event per user session as well as return the maximum number of sessions per user. So this means that i would like to keep track of how many user sessions have the event "message to user" but would like to eliminate duplicates that occur in the same session if that makes sense?
I am also looking for the total user sessions across all users
I was unsuccessful trying to find these values. My table looks like so:
user_id | event | user_session_id
1 | message to user | 1
1 | message to user | 1
1 | message from user | 1
1 | message to user | 1
1 | message from user | 2
1 | message to user | 2
1 | message to user | 3
2 | message to user | 1
2 | message to user | 1
2 | message from user | 1
2 | message to user | 1
2 | message from user | 2
2 | message to user | 2
My expected output would be something like this:
user_id | event | user_session_id | max_session_by_user | total_sessions
1 | message to user | 1 | 3 | 5
1 | message to user | 2 | 3 | 5
1 | message to user | 3 | 3 | 5
2 | message to user | 1 | 2 | 5
2 | message to user | 2 | 2 | 5
Thank you
EDIT: I added more clarification about what i meant when i am looking for with regards to the event column
First filter for the desired event and eliminate duplicates.
Then add counts using window functions.
SELECT user_id, event, user_session_id,
count(*) OVER (PARTITION BY user_id) AS max_session_by_user,
count(*) OVER () AS total_sessions
FROM (SELECT DISTINCT user_id, event, user_session_id
FROM events_table
WHERE event = 'message to user') AS q;
I'm not sure where the event column comes from. But you seem to want:
select user, user_session_id,
count(*) over (partition by user) as max_sessions
from t
group by user, user_session_id;

Business Objects - Count Distinct Entries

I'm pretty new to Business Objects
I have data for requests which then have a number of status against them. I have used variables to create a flag (of 1) on those with a status of "First_Seen" and "Authorized".
Now for RequestID some only have an Authorized flag, some only have a First_Seen flag and others have both flags. I need to know how many RequestIDs I have irrespective of whether they have 1 or 2 flags (not the total number of flags).
Edit:
Note that some of the RequestIDs have multiple status.
RequestID | Status | First_Seen_Flag | Authorised_Flag |
:----------|:-------------:|:-----------------:|:-----------------:|
123456 | First_Seen | 1 | 1 |
123456 | Authorised | 1 | 0 |
345678 | First_Seen | 1 | 1 |
345678 | Authorised | 0 | 1 |
987654 | First_Seen | 1 | 0 |
765432 | Authorised | 0 | 0 |
I need to count unique RequestIDs where the First_Seen_Flag is 1 of Authorised_Flag is 1 or both flags are 1, bearing in mind that not all RequestIDs have both status or have multiple ie 987654 which only has a single status and 765432 which only has authorized but does not have any flag on it as it did not meet the criteria to be flagged.
Your assistance is much appreciated.
Gareth
To do this you need to create a column doing an "AND" with both flags. Then you can count on this third column.
New column you insert should be something like below formula:
=([First_Seen] = 1) and ([Authorized] = 1)

SQL Query not Working on ORDER BY

I have a SQL Table that I'm trying to query and order the return. I am able to query just fine and the SQL Statement that I'm using is also working with the exception of the last ORDER BY statement that I need to execute. The sort order is as follows:
Sort the Status column so that 'open' is on top, 'closed' on bottom
Order the 'Flag' column so that empty (null) values are on bottom (above Status = Closed) and values on top
Order the results of items 1 and 2 by the Number column
Here is an example of the raw data:
| Flag | Number | Status |
|------------------------|
| a | 1 | open |
| | 5 | open |
| | 3 | closed |
| a | 4 | open |
| a | 2 | closed |
Here is what I'm going for:
| Flag | Number | Status |
|------------------------|
| a | 1 | open |
| a | 4 | open |
| | 5 | open |
| a | 2 | closed |
| | 3 | closed |
The query statement that I'm using is as follows:
sqlCom.CommandText = "SELECT * FROM Questions
WHERE Identifier = #identifier
AND Flag <> 'DELETED'
ORDER BY Status DESC
, (CASE WHEN Flag is null THEN 1 ELSE 0 END) ASC
, Flag DESC
, [Number] * 1 ASC";
Now, everything works fine, but the 3rd item above (sorting by Number column) doesn't work. Any ideas why?
What I'm currently getting:
| Flag | Number | Status |
|------------------------|
| a | 4 | open | <-- Out of order. Should be below the next record
| a | 1 | open | <-- Out of order. Should be one record up
| | 5 | open | <-- OK
| | 6 | open | <-- OK
| | 3 | closed | <-- OK
| a | 2 | closed | <-- OK
Thanks in advance for any helpful input. I have tried fiddling with the query in SSMS but no luck.
Your third sort expression is on Flag. Those values are being sorted alphabetically before the QNumber sort applies. And note that case matters in the ordering as well.
Here's how I would write it:
ORDER BY
Status DESC, -- might be better to use a case expression
CASE WHEN Flag IS NOT NULL THEN 0 ELSE 1 END,
QNumber
Since your data in the examples contradicts the data in the screenshot, it's not clear whether you needed to remove the third sort column entirely or just sort by ignoring the case of the text.

SQL - Combining 3 rows per group in a logging scenario

I have reworked our API's logging system to use Azure Table Storage from using SQL storage for cost and performance reasons. I am now migrating our legacy logs to the new system. I am building a SQL query per table that will map the old fields to the new ones, with the intention of exporting to CSV then importing into Azure.
So far, so good. However, one artifact of the previous system is that it logged 3 times per request - call begin, call response and call end - and the new one logs the call as just one log (again, for cost and performance reasons).
Some fields common are common to all three related logs, e.g. the Session which uniquely identifies the call.
Some fields I only want the first log's value, e.g. Date which may be a few seconds different in the second and third log.
Some fields are shared for the three different purposes, e.g. Parameters gives the Input Model for Call Begin, Output Model for Call Response, and HTTP response (e.g. OK) for Call End.
Some fields are unused for two of the purposes, e.g. ExecutionTime is -1 for Call Begin and Call Response, and a value in ms for Call End.
How can I "roll up" the sets of 3 rows into one row per set? I have tried using DISTINCT and GROUP BY, but the fact that some of the information collides is making it very difficult. I apologize that my SQL isn't really good enough to really explain what I'm asking for - so perhaps an example will make it clearer:
Example of what I have:
SQL:
SELECT * FROM [dbo].[Log]
Results:
+---------+---------------------+-------+------------+---------------+---------------+-----------------+--+
| Session | Date | Level | Context | Message | ExecutionTime | Parameters | |
+---------+---------------------+-------+------------+---------------+---------------+-----------------+--+
| 84248B7 | 2014-07-20 19:16:15 | INFO | GET v1/abc | Call Begin | -1 | {"Input":"xx"} | |
| 84248B7 | 2014-07-20 19:16:15 | INFO | GET v1/abc | Call Response | -1 | {"Output":"yy"} | |
| 84248B7 | 2014-07-20 19:16:15 | INFO | GET v1/abc | Call End | 123 | OK | |
| F76BCBB | 2014-07-20 19:16:17 | ERROR | GET v1/def | Call Begin | -1 | {"Input":"ww"} | |
| F76BCBB | 2014-07-20 19:16:18 | ERROR | GET v1/def | Call Response | -1 | {"Output":"vv"} | |
| F76BCBB | 2014-07-20 19:16:18 | ERROR | GET v1/def | Call End | 456 | BadRequest | |
+---------+---------------------+-------+------------+---------------+---------------+-----------------+--+
Example of what I want:
SQL:
[Need to write this query]
Results:
+---------------------+-------+------------+----------+---------------+----------------+-----------------+--------------+
| Date | Level | Context | Message | ExecutionTime | InputModel | OutputModel | HttpResponse |
+---------------------+-------+------------+----------+---------------+----------------+-----------------+--------------+
| 2014-07-20 19:16:15 | INFO | GET v1/abc | Api Call | 123 | {"Input":"xx"} | {"Output":"yy"} | OK |
| 2014-07-20 19:16:17 | ERROR | GET v1/def | Api Call | 456 | {"Input":"ww"} | {"Output":"vv"} | BadRequest |
+---------------------+-------+------------+----------+---------------+----------------+-----------------+--------------+
select L1.Session, L1.Date, L1.Level, L1.Context, 'Api Call' AS Message,
L3.ExecutionTime,
L1.Parameters as InputModel,
L2.Parameters as OutputModel,
L3.Parameters as HttpResponse
from Log L1
inner join Log L2 ON L1.Session = L2.Session
inner join Log L3 ON L1.Session = L3.Session
where L1.Message = 'Call Begin'
and L2.Message = 'Call Response'
and L3.Message = 'Call End'
This would work in your sample.

Updating a field in a table with a number aggregated from other table

I have a log table with web log entries which have a session ID. I also have a session table summarizing sessions from the previous table. So I have to run some update SQL statement but I don't get how to construct a SQL statement for a field named "session_length". In this field I hope to assign the number of events in that particular session.
Let's say I have the following log table:
| Session ID | Timestamp | Action | ...
| 1 | 00:00:00 | get | ...
| 2 | 00:00:00 | get | ...
| 1 | 00:00:01 | get | ...
| 1 | 00:00:02 | get | ...
| 2 | 00:00:02 | get | ...
In the session table, I would like to have the following values for session_length field:
| Session ID | session_length | ...
| 1 | 3 | ...
| 2 | 2 | ...
I am not sure whether this can be done by a single query but I would like to see if this can be done by a single SQL query using update. In particular, I am using PostgresSQL in AWS RedShift.
You can do this with a correlated subquery in the update statement:
update sessions
set session_length = (select count(*)
from log
where log.sessionid = sessions.sessionid
)