I am working on a SQL query, where considering the following dataset:
clientid
visited
channel
purchase
visit_order
123
abc133
google
0
1
123
efg446
facebook
0
2
123
gij729
instagram
1
3
456
klm183
google
0
1
456
nop496
linkedin
0
2
456
qrs729
pinterest
1
3
456
tuv894
google
0
1
456
wyz634
instagram
0
2
I want to get the following output:
clientid
user_journey
conversion
123
google, facebook, instagram
1
456
google, linkedin, interest
1
456
google, instagram
0
where the user_jorney column is composed of the channels that participated in a conversion journey. Note that the journey of users who, until then, have not made a purchase is also built.
Looking for commands that can help with this task, I found concat_ws, where I wrote the code below:
select
clientid,
concat_ws(',', collect_list(channel)) as user_journey,
sum(purchase) as conversion
from table_name group by clientid;
I get this result:
clientid
user_journey
conversion
123
google, facebook, instagram
1
456
google, linkedin, pinterest, google, instagram
1
Now I'm trying to consider a condition to get the desired result but so far I haven't been able to find.
Could you help me how can i solve this task?
Note: you are missing very important data point that most likely available in your data - which is timestamp or data or something that allows determine global order of visits
Having this in mind consider below (ts is reference to the missing in your question column)
select clientid,
string_agg(channel, ', ' order by visit_order) user_journey,
sum(purchase) as conversion
from (
select *, countif(visit_order = 1) over(partition by clientid order by ts) grp
from your_table
)
group by clientid, grp
if applied to sample data in your question - output is
Related
I am working on a SQL query (Azure Databricks environment), where considering the following dataset:
clientid
visited
channel
purchase
visit_order
123
abc133
google
0
1
123
efg446
facebook
0
2
123
gij729
instagram
1
3
456
klm183
google
0
1
456
nop496
linkedin
0
2
456
qrs729
pinterest
1
3
456
tuv894
google
0
1
456
wyz634
instagram
0
2
I want to get the following output:
clientid
user_journey
conversion
123
google, facebook, instagram
1
456
google, linkedin, interest
1
456
google, instagram
0
where the user_jorney column is composed of the channels that participated in a conversion journey. Note that the journey of users who, until then, have not made a purchase is also built.
Looking for commands that can help with this task, I found concat_ws, where I wrote the code below:
select
clientid,
concat_ws(',', collect_list(channel)) as user_journey,
sum(purchase) as conversion
from table_name group by clientid;
I get this result:
clientid
user_journey
conversion
123
google, facebook, instagram
1
456
google, linkedin, pinterest, google, instagram
1
Now I'm trying to consider a condition to get the desired result but so far I haven't been able to find.
Could you help me how can i solve this task?
I tried to repro your scenario,
where I gave sub query instead of original table in that query, I am selecting original table along with one extra column as row number where I am giving row number two every row partitioned by visit_order column and orderd by visited column.
My Query:
select
clientid,
concat_ws(',', collect_list(channel)) as user_journey,
sum(purchase) as conversion
from (SELECT *,ROW_NUMBER() OVER (PARTITION BY visit_order ORDER BY visited) AS RowNumber FROM docs) as docstb group by clientid , RowNumber order by clientid asc
Execution and Output:
I have two tables having following data-
Social_Tbl
ID Name Value
------------------------
1 Facebook FB
2 Orkut OR
3 Google GL
4 Other OT
And Organization_tbl
ID Organization Name
-----------------------------
1 1234 Facebook
2 1234 Google
3 146 Other
4 126 Other
5 126 Facebook
6 77 Google
Here, 'Name' is the foreign key (Not ID).
I want to join these tables and get the 'Name' columns data which does not belong to organization id 1234. As follows-
Name
----
Orkut
Other
Here, 'Orkut' and 'Other' does not belong to 1234 organization.
I tried following query for this-
select * from Social_Tbl st
join Organization_tbl ot
on st.Name = ot.Name
where Organization = 1234
This query fetches Names related to 1234 i.e Facebook and Google. I want result
Orkut and Other. If I replace Organization = 1234 with Organization != 1234 it returns all data from Organization_tbl.
Can somebody help me on this. This should be pretty simple, just npt able to find it out.
Could be done with a subquery:
select st.Name
from Social_Tbl st
where not exists (
select *
from Organization_tbl ot
where st.Name = ot.Name
and ot.Organization = 1234
)
(This also returns names that don't have an entry in Organization_tbl at all.)
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Oracle SQL - How to Retrieve highest 5 values of a column
I'm writing oracle query but stuck in the following problem
table is like this:
**Tool** **Name** **Gender**
Facebook Alice F
Facebook Alex M
Facebook Loong M
Facebook Jimmy M
Twitter James M
Twitter Jessica F
Twitter Sam M
Twitter Kathrine F
Google Rosa F
Google Lily F
Google Bob M
What I wanna get is the first female in each tool
the result should be like:
Facebook Alice
Twitter Jessica
Google Rosa
I'm trying to get this by using query not functions or procedures
Thank for helping
select *
from (
select row_number() over (partition by tool order by name) as rn
, Name
, Tool
from YourTable
where Gender = 'F'
) SubQueryAlias
where rn = 1 -- Only first per tool
Example at SQL Fiddle.
This is another alternative.
select min(name), tool
from yourTable
where gender = 'F'
group by tool
I'd like to have a little bit of a discussion on which is better or which does what, for me its the first time I see row_number(). Note thas this one returns the female in the alphabetical order, yours does the same by sorting in a window, what is the difference?
This question is a continuation of this question: (Oracle APEX - SQL - Creating a Sequential History and Calculating Days Between Each Phase)
Some background:
So for one of my applications I had decided that I needed the ability to capture more detailed metrics than I was previously doing. My group creates documents and specifically I wanted to know exactly how long (in days) that document spent in each phase of its development. The table for capturing this data is structured like so:
TBL_DOC_TIMELINE
DOC_ENTRY_ID DOC_ID DOC_STATUS DOC_CATEGORY DOC_DATE
1 123 Planned OPEN 06-05-2012
7 123 Draft OPEN 06-15-2012
38 123 Approval OPEN 06-20-2012
102 123 Published CLOSED 06-30-2012
All of our documents are using the same table for this function so I could not simply key on the DOC_ENTRY_ID though it could help. I needed to find the max DOC_ENTRY_ID for the DOC_ID and then calculate. I do this until I reach the DOC_CATEGORY of 'CLOSED' and which point '0' is inserted in the cell as that is the end of that DOC_ID's lifecycle. Like so:
DOC_ENTRY_ID DOC_ID DOC_STATUS DOC_CATEGORY DOC_DATE DOC_DURATION
1 123 Planned OPEN 06-05-2012 10
7 123 Draft OPEN 06-15-2012 5
38 123 Approval OPEN 06-20-2012 10
102 123 Published CLOSED 06-30-2012 0
This is all currently accomplished thanks to the brilliant Tony Andrews who provided the rough DRAFT for the View code that eventually became:
create or replace view DOC_TIMELINE as
select t.DOC_ENTRY_ID, t.DOC_ID, t.DOC_STATUS, t.DOC_CATEGORY, t.DOC_DATE
, case when DOC_CATEGORY = 'CLOSED' then 0
else lead(DOC_DATE) over (partition by DOC_ID order by DOC_ENTRY_ID)
- DOC_DATE
end as duration
from TBL_DOC_TIMELINE t;
What I now need to do:
This all works perfectly except that in my initial pass at this I forgot one very important part of my requirements. My goal is to know how long documents are spending in each phase but I completely neglected to realize that I need to gather this information in real time and not only after a document is CLOSED. With this current set up the View will look like this in the middle of its lifecycle:
DOC_ENTRY_ID DOC_ID DOC_STATUS DOC_CATEGORY DOC_DATE DOC_DURATION
1 123 Planned OPEN 06-05-2012 10
7 123 Draft OPEN 06-15-2012 5
38 123 Approval OPEN 06-20-2012 -
See the problem? If I need to know how long that document has spent in the Approval state then I need to wait until it has left that state for the duration to be calculated. So even though it might have been in that state for 20 days my metrics will not reflect that.
What I need to do is find some way to tweak the View code above to calculate this value against the SYSDATE() if the MAX(DOC_ENTRY_ID) for a given DOC_ID has a DOC_CATEGORY = 'OPEN'.
assuming SYSDATE() = '06-29-2012'
DOC_ENTRY_ID DOC_ID DOC_STATUS DOC_CATEGORY DOC_DATE DOC_DURATION
1 123 Planned OPEN 06-05-2012 10
7 123 Draft OPEN 06-15-2012 5
38 123 Approval OPEN 06-20-2012 9
Does any of that make sense? I am imagining that I need to add another CASE to the View code but the sytanx is giving me some trouble. It is probably a simple solution to you guys but my familiarity with these kinds of cases and the sytanx allowed is limited and my research has not uncovered anything that relates.
I sincerely appreciate any and all help. Thanks guys!
Something like this?-
create or replace view DOC_TIMELINE as
select t.DOC_ENTRY_ID, t.DOC_ID, t.DOC_STATUS, t.DOC_CATEGORY, t.DOC_DATE
, case when DOC_CATEGORY = 'CLOSED' then 0
when lead(DOC_DATE) over (partition by DOC_ID order by DOC_ENTRY_ID)
- DOC_DATE is null then trunc(sysdate)-trunc(DOC_DATE)
else lead(DOC_DATE) over (partition by DOC_ID order by DOC_ENTRY_ID)
- DOC_DATE
end as duration
from TBL_DOC_TIMELINE t;
or since lead(DOC_DATE) will always be null for max DOC_ENTRY_ID (for a DOC_ID)-
create or replace view DOC_TIMELINE as
select t.DOC_ENTRY_ID, t.DOC_ID, t.DOC_STATUS, t.DOC_CATEGORY, t.DOC_DATE
, case when DOC_CATEGORY = 'CLOSED' then 0
when lead(DOC_DATE) over (partition by DOC_ID order by DOC_ENTRY_ID)
is null then trunc(sysdate)-trunc(DOC_DATE)
else lead(DOC_DATE) over (partition by DOC_ID order by DOC_ENTRY_ID)
- DOC_DATE
end as duration
from TBL_DOC_TIMELINE t;
I'm using Access in Office 10 in a mixed Windows 7 / Windows XP environment.
I need to be able to select the current address for employees from a list. The problem I have is that the address datefrom could be past, future or null.
Removing future is obviously easy in the criteria, i.e. WHERE datefrom <=date()
The problem I have is that in the initial import of address data, most addresses did not have this information, and so the field is null. An example of the data is below: (date format is dd/mm/yyyy)
ID EmployeeID Postcode DateFrom
1 1 AB12 3CD [null]
2 2 GH12 5RF [null]
3 1 CD34 5EF 10/03/2012
4 3 HA25 3PO [null]
5 3 HA4 7RT 04/06/2012]
6 3 DB43 5YU 12/11/2011]
My desired output would be: (order of employees not important)
ID EmployeeID Postcode DateFrom
2 2 GH12 5RF [null]
3 1 CD34 5EF 10/03/2012
5 3 HA4 7RT 04/06/2012
I've tried sorting by DateFrom DESC which does order the list as below:
ID EmployeeID Postcode DateFrom
3 1 CD34 5EF 10/03/2012
1 1 AB12 3CD [null]
2 2 GH12 5RF [null]
5 3 HA4 7RT 04/06/2012
6 3 DB43 5YU 12/11/2011
4 3 HA25 3PO [null]
So if I could then just take the first result for each employee I'd be fine. However I've tried (and failed) to do SQL including things like DISTINCT, first() and GROUP BY, but don't seem to be able to get anywhere.
I probably just can't see the easy obvious answer, so any help would be very much appreciated.
Use order by like this:
ORDER BY (CASE WHEN [DateFrom] IS NULL THEN 1 ELSE 0 END) DESC,
[DateFrom] DESC
I found the answer from this post. Please make sure to search first.
How about:
SELECT Adr.ID, Adr.EmployeeID, Adr.Postcode, Adr.DateFrom
FROM Adr
WHERE (((Adr.ID) In
(SELECT Top 1 ID
FROM adr b
WHERE b.EmployeeID=Adr.EmployeeID
ORDER BY DateFrom DESC )));
The above is built using the query design wondow, so half the parentheses are unnecessary, however, if you are using the query design window, you may as well leave them.
Could you simply update the null values to be a set date like 1900-01-01?