To count a column based on another column's repeating(same) entry

To count a column based on another column's repeating(same) entry - sql

I want to create a report of calls last made based on weeks from last call and call-Group
Actual Data is like below with call id, date of call and call grouping
callid | Date | Group
----------------------------
1 | 6-1-18 | a1
2 | 6-1-18 | a2
3 | 7-1-18 | a3
4 | 8-1-18 | a1
5 | 9-1-18 | a2
6 | 9-1-18 | a4
Expected data is to display the number of calls for each call group corresponding to the number of week from last call
week | |
from | |
last |Group|Group
call | a1 | a2
--------------------
1 | 2 | 2 ->number of calls
2 | - | -
3 | 1 | -
4 | 2 | -
5 | - | 3
6 | - | -
Can anyone please tell me some solution for this

Although you data provided was a very small set and not rich enough to cover all cases, here is an sql that will calculate the number of weeks difference between each call and last call within a group and count the number of calls for each group for the particular week difference.
with your_table as (
select 1 as "callid", to_date('6-1-18','dd-mm-rr') as "date", 'a1' as "group" from dual
union select 2, to_date('6-1-18','mm-dd-rr'), 'a2' from dual
union select 3, to_date('7-1-18','mm-dd-rr'), 'a3' from dual
union select 4, to_date('8-1-18','mm-dd-rr'), 'a1' from dual
union select 5, to_date('9-1-18','mm-dd-rr'), 'a2' from dual
union select 6, to_date('6-1-18','mm-dd-rr'), 'a4' from dual
),
data1 as (
select t.*, max(t."date") over (partition by t."group") last_call_dt from your_table t
),
data2 as (select t.*, round((last_call_dt-t."date")/7,0) as weeks_diff from data1 t)
select * from (
select t.weeks_diff, t."callid", t."group" from data2 t
)
PIVOT
(
COUNT("callid")
FOR "group" IN ('a1', 'a2', 'a3','a4')
)
order by weeks_diff
to try it out with your table just make the following change:
with your_table as (select * from my_table), ....
let me know how it goes :)

Related

Database Design Historical Data Model

I am thinking a good design to capture history of product change. Suppose a user can have different products to trade for each day.
User Product Day
1 A 1
1 B 1
1 A 2
1 B 2
1 C 3
As we can see above on day 3, Product C is Added and Product A B are removed.
Thinking of below 2 design:
#1 Capture the product changes and store it as start and end date
User Product Start End
1 A 1 3
1 B 1 3
1 C 3 -
#2 Capture the product changes as 1 record
User Product Action Day
1 A Added 1
1 B Added 1
1 C Added 3
1 A Removed 3
1 B Removed 3
My following question is can these 2 models be converted to each other. For example, we can use Lead/Lag to convert #2 into #1.
Which design is better? Our system is using #2 to store the historical data.
Updated:
the intention to use the data is showing the product changes history.
For example, for a given date range, what's the product change for a particular user?

The second model seems better, at least if your main interest is in queries like "find all changes for all users and products, which occurred between DATE_1 and DATE_2".
With the second model, the query is trivial:
select * from (table) where (date) between DATE_1 and DATE_2;
How would you write the query for the first model?
Moreover, with the second model you could create an index on (user, date) - or even just on (date) - which will make quick work of the query. Even if you had indexes on the table in the first model, they wouldn't be used due to the complicated nature of the query.
While integrity constraints would be relatively difficult in both cases (as they are cross-rows), they would be much easier to implement - either with materialized views or with triggers - with the second model. In the first model you have to make sure there are no overlaps between the intervals. With the first model, if you partition by user and order by date, the condition is simply that the action alternates from row to row. Still not trivial to implement, but much simpler than the "non-overlapping intervals" condition for the first model.
To your other question: It is, indeed, trivial to go from either model to the other, using PIVOT and UNPIVOT. You do need an analytic function (ROW_NUMBER) before you PIVOT to go from model #2 to #1. You don't need any preparation to go from #1 to #2.

Personally, I think the first option is better. I'm assuming you have so many rows that the raw structure of a row per user, product and date is too heavy? Because for visualisations I think the raw table would work fine as is.
However, if you have to aggregate due to size, and do not need to know the amounts of the product nor how many users are selling them on any given day, then the first option would be easier to work with in my opinion just in terms of SQL. On the other hand, you will have a problem in case a product can have several start and end dates, since I am assuming a new entry would just overwrite the previous date stamp.
So, that in mind, I would personally create a table with a row per day(or monthly if you want to minimise the size of the table and monthly is granular enough for your use case). Then add a column for each product and whether or not they were sold that day. You could even do it with a count on the number of users selling that product, which would give you a little more detail. The only problem this model has, is that I would only use it in case it is truly static, historical data with no need to add new products.

You can convert from any one format to the other formats.
Data in the first format:
CREATE TABLE table1 (Usr, Product, Day) AS
SELECT 1, 'A', '1' FROM DUAL UNION ALL
SELECT 1, 'B', '1' FROM DUAL UNION ALL
SELECT 1, 'A', '2' FROM DUAL UNION ALL
SELECT 1, 'B', '2' FROM DUAL UNION ALL
SELECT 1, 'C', '3' FROM DUAL
Then:
SELECT usr,
product,
day + DECODE( action, 'Removed', 1, 0) AS day,
action
FROM (
SELECT Usr,
Product,
Day,
CASE
WHEN LAG( Day ) OVER ( PARTITION BY Usr, Product ORDER BY Day ) = Day - 1
THEN NULL
ELSE 'Added'
END AS Added,
CASE
WHEN LEAD( Day ) OVER ( PARTITION BY Usr, Product ORDER BY Day ) = Day + 1
THEN NULL
WHEN Day = MAX( Day ) OVER ()
THEN NULL
ELSE 'Removed'
END AS Removed
FROM table1
)
UNPIVOT ( action FOR value IN ( Added, Removed ) )
Outputs that data in the second form:
USR | PRODUCT | DAY | ACTION
--: | :------ | --: | :------
1 | A | 1 | Added
1 | A | 3 | Removed
1 | B | 1 | Added
1 | B | 3 | Removed
1 | C | 3 | Added
and:
SELECT Usr,
Product,
MIN( Day ) AS "Start",
CASE MAX( Day )
WHEN Last_Day
THEN NULL
ELSE MAX( Day ) + 1
END AS "End"
FROM (
SELECT Usr,
Product,
Day,
Day - ROW_NUMBER() OVER ( PARTITION BY Usr, Product ORDER BY Day ) AS grp,
MAX( Day ) OVER () AS last_day
FROM table1
)
GROUP BY Usr, Product, Grp, Last_Day
ORDER BY Usr, Product, "Start"
Outputs the data in the third format:
USR | PRODUCT | Start | End
--: | :------ | :---- | ---:
1 | A | 1 | 3
1 | B | 1 | 3
1 | C | 3 | null
Data in the second format:
CREATE TABLE table2 ( Usr, Product, Day, Action ) AS
SELECT 1, 'A', 1, 'Added' FROM DUAL UNION ALL
SELECT 1, 'A', 3, 'Removed' FROM DUAL UNION ALL
SELECT 1, 'B', 1, 'Added' FROM DUAL UNION ALL
SELECT 1, 'B', 3, 'Removed' FROM DUAL UNION ALL
SELECT 1, 'C', 3, 'Added' FROM DUAL;
Then you can convert it to the third format using:
SELECT Usr,
Product,
"Start",
"End"
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY Usr, Product, Action ORDER BY Day ) AS rn
FROM table2 t
)
PIVOT (
MAX( Day )
FOR Action IN (
'Added' AS "Start",
'Removed' AS "End"
)
)
Which outputs:
USR | PRODUCT | Start | End
--: | :------ | ----: | ---:
1 | A | 1 | 3
1 | B | 1 | 3
1 | C | 3 | null
Data in the third format:
CREATE TABLE table3 ( Usr, Product, "Start", "End" ) AS
SELECT 1, 'A', 1, 3 FROM DUAL UNION ALL
SELECT 1, 'B', 1, 3 FROM DUAL UNION ALL
SELECT 1, 'C', 3, NULL FROM DUAL;
Then to get the data in the first format you can use:
WITH unrolled_data ( Usr, Product, Day, "End" ) AS (
SELECT Usr, Product, "Start", "End"
FROM table3
UNION ALL
SELECT Usr, Product, Day + 1, "End"
FROM unrolled_data
WHERE Day + 1 < COALESCE( "End", 4 /* The last day + 1 */ )
)
SELECT Usr, Product, Day
FROM unrolled_data
ORDER BY Usr, Day, Product
Outputs:
USR | PRODUCT | DAY
--: | :------ | --:
1 | A | 1
1 | B | 1
1 | A | 2
1 | B | 2
1 | C | 3
And can convert to the second format using:
SELECT *
FROM table3
UNPIVOT ( Day FOR Action IN ( "Start" AS 'Added', "End" AS 'Removed' ) )
Which outputs:
USR | PRODUCT | ACTION | DAY
--: | :------ | :------ | --:
1 | A | Added | 1
1 | A | Removed | 3
1 | B | Added | 1
1 | B | Removed | 3
1 | C | Added | 3
(and you can combine queries to convert from 2-to-1.)
db<>fiddle here

Possible to use a column name in a UDF in SQL?

I have a query in which a series of steps is repeated constantly over different columns, for example:
SELECT DISTINCT
MAX (
CASE
WHEN table_2."GRP1_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP1_MINIMUM_DATE",
MAX (
CASE
WHEN table_2."GRP2_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP2_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
I was considering writing a function to accomplish this as doing so would save on space in my query. I have been reading a bit about UDF in SQL but don't yet understand if it is possible to pass a column name in as a parameter (i.e. simply switch out "GRP1_MINIMUM_DATE" for "GRP2_MINIMUM_DATE" etc.). What I would like is a query which looks like this
SELECT DISTINCT
FUNCTION(table_2."GRP1_MINIMUM_DATE") AS "GRP1_MINIMUM_DATE",
FUNCTION(table_2."GRP2_MINIMUM_DATE") AS "GRP2_MINIMUM_DATE",
FUNCTION(table_2."GRP3_MINIMUM_DATE") AS "GRP3_MINIMUM_DATE",
FUNCTION(table_2."GRP4_MINIMUM_DATE") AS "GRP4_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
Can anyone tell me if this is possible/point me to some resource that might help me out here?
Thanks!

There is no such direct as #Tejash already stated, but the thing looks like your database model is not ideal - it would be better to have a table that has USER_ID and GRP_ID as keys and then MINIMUM_DATE as seperate field.
Without changing the table structure, you can use UNPIVOT query to mimic this design:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4))
Result:
| USER_ID | GRP_ID | MINIMUM_DATE |
|---------|--------|--------------|
| 1 | 1 | 09/09/19 |
| 1 | 2 | 09/09/19 |
| 1 | 3 | 09/09/19 |
| 1 | 4 | 09/09/19 |
| 2 | 1 | 09/08/19 |
| 2 | 2 | 09/07/19 |
| 2 | 3 | 09/06/19 |
| 2 | 4 | 09/05/19 |
With this you can write your query without further code duplication and if you need use PIVOT-syntax to get one line per USER_ID.
The final query could then look like this:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
, INPUT_COHORT(USER_ID, ANCHOR_DATE)
AS (SELECT 1, SYSDATE-1 FROM dual UNION ALL
SELECT 2, SYSDATE-2 FROM dual UNION ALL
SELECT 3, SYSDATE-3 FROM dual)
-- Above is sampledata query starts from here:
, unpiv AS (SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4)))
SELECT qcsj_c000000001000000 user_id, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE
FROM INPUT_COHORT cohort
LEFT JOIN unpiv table_2
ON cohort.USER_ID = table_2.USER_ID
pivot (MAX(CASE WHEN minimum_date <= cohort."ANCHOR_DATE" THEN 1 ELSE 0 END) AS MINIMUM_DATE
FOR grp_id IN (1 AS GRP1,2 AS GRP2,3 AS GRP3,4 AS GRP4))
Result:
| USER_ID | GRP1_MINIMUM_DATE | GRP2_MINIMUM_DATE | GRP3_MINIMUM_DATE | GRP4_MINIMUM_DATE |
|---------|-------------------|-------------------|-------------------|-------------------|
| 3 | | | | |
| 1 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 1 | 1 |
This way you only have to write your calculation logic once (see line starting with pivot).

Oracle Sql: Obtain a Sum of a Group, if Subgroup condition met

I have a dataset upon which I am trying to obain a summed value for each group, if a subgroup within each group meets a certain condition. I am not sure if this is possible, or if I am approaching this problem incorrectly.
My data is structured as following:
+----+-------------+---------+-------+
| ID | Transaction | Product | Value |
+----+-------------+---------+-------+
| 1 | A | 0 | 10 |
| 1 | A | 1 | 15 |
| 1 | A | 2 | 20 |
| 1 | B | 1 | 5 |
| 1 | B | 2 | 10 |
+----+-------------+---------+-------+
In this example I want to obtain the sum of values by the ID column, if a transaction does not contain any products labeled 0. In the above described scenario, all values related to Transaction A would be excluded because Product 0 was purchased. With the outcome being:
+----+-------------+
| ID | Sum of Value|
+----+-------------+
| 1 | 15 |
+----+-------------+
This process would repeat for multiple IDs with each ID only containing the sum of values if the transaction does not contain product 0.

Hmmm . . . one method is to use not exists for the filtering:
select id, sum(value)
from t
where not exists (select 1
from t t2
where t2.id = t.id and t2.transaction = t.transaction and
t2.product = 0
)
group by id;

Do not need to use correlated subquery with not exists.
Just use group by.
with s (id, transaction, product, value) as (
select 1, 'A', 0, 10 from dual union all
select 1, 'A', 1, 15 from dual union all
select 1, 'A', 2, 20 from dual union all
select 1, 'B', 1, 5 from dual union all
select 1, 'B', 2, 10 from dual)
select id, sum(sum_value) as sum_value
from
(select id, transaction,
sum(value) as sum_value
from s
group by id, transaction
having count(decode(product, 0, 1)) = 0
)
group by id;
ID SUM_VALUE
---------- ----------
1 15

SQL statement that allows me to output multiple columns as one row

I have table in my database that goes like this:
ID | VARIANT | SIFRANT | VALUE
When I call
SELECT * FROM example_table
I get each row on its own. But records in my database can have the same VARIANT. And I'd like to output those records them in the same row.
for example if I have
ID | VARIANT | SIFRANT | VALUE
1 | 3 | 5 | 50
2 | 3 | 6 | 49
3 | 3 | 1 | 68
I'd like the output to be
VARIANT | VALUES_5 | VALUES_6 | VALUES_1
3 | 50 | 49 | 68
EDIT: I found the solution using PIVOT, the code goes like this:
select *
from (
select variant, VALUE, SIFRANT
from example_table
)
pivot
(
max(VALUE)
for SIFRANT
in ('1','2','3','4','5','6','7','8','9','10')
)

It seems that you only need an aggregation on your data:
with test(ID, VARIANT, SIFRANT, VALUE) as
(
select 1, 3, 5, 50 from dual union all
select 2, 3, 6, 49 from dual union all
select 3, 3, 1, 68 from dual
)
select variant, listagg (value, ' ') within group ( order by id)
from test
group by variant

SQL query update by grouping

I'm dealing with some legacy data in an Oracle table and have the following
--------------------------------------------
| RefNo | ID |
--------------------------------------------
| FOO/BAR/BAZ/AAAAAAAAAA | 1 |
| FOO/BAR/BAZ/BBBBBBBBBB | 1 |
| FOO/BAR/BAZ/CCCCCCCCCC | 1 |
| FOO/BAR/BAZ/DDDDDDDDDD | 1 |
--------------------------------------------
For each of the /FOO/BAR/BAZ/% records I want to make the ID a Unique incrementing number.
Is there a method to do this in SQL?
Thanks in advance
EDIT
Sorry for not being specific. I have several groups of records /FOO/BAR/BAZ/, /FOO/ZZZ/YYY/. The same transformation needs to occur for each of these other (example) groups. The recnum can't be used I want ID to start from 1, incrementing, for each group of records I have to change.
Sorry for making a mess of my first post. Output should be
--------------------------------------------
| RefNo | ID |
--------------------------------------------
| FOO/BAR/BAZ/AAAAAAAAAA | 1 |
| FOO/BAR/BAZ/BBBBBBBBBB | 2 |
| FOO/BAR/BAZ/CCCCCCCCCC | 3 |
| FOO/BAR/BAZ/DDDDDDDDDD | 4 |
| FOO/ZZZ/YYY/AAAAAAAAAA | 1 |
| FOO/ZZZ/YYY/BBBBBBBBBB | 2 |
--------------------------------------------

Let's try something like this(Oracle version 10g and higher):
SQL> with t1 as(
2 select 'FOO/BAR/BAZ/AAAAAAAAAA' as RefNo, 1 as ID from dual union all
3 select 'FOO/BAR/BAZ/BBBBBBBBBB', 1 from dual union all
4 select 'FOO/BAR/BAZ/CCCCCCCCCC', 1 from dual union all
5 select 'FOO/BAR/BAZ/DDDDDDDDDD', 1 from dual union all
6 select 'FOO/ZZZ/YYY/AAAAAAAAAA', 1 from dual union all
7 select 'FOO/ZZZ/YYY/BBBBBBBBBB', 1 from dual union all
8 select 'FOO/ZZZ/YYY/CCCCCCCCCC', 1 from dual union all
9 select 'FOO/ZZZ/YYY/DDDDDDDDDD', 1 from dual
10 )
11 select row_number() over(partition by ComPart order by DifPart) as id
12 , RefNo
13 From (select regexp_substr(RefNo, '[[:alpha:]]+$') as DifPart
14 , regexp_substr(RefNo, '([[:alpha:]]+/)+') as ComPart
15 , RefNo
16 , Id
17 from t1
18 ) q
19 ;
ID REFNO
---------- -----------------------
1 FOO/BAR/BAZ/AAAAAAAAAA
2 FOO/BAR/BAZ/BBBBBBBBBB
3 FOO/BAR/BAZ/CCCCCCCCCC
4 FOO/BAR/BAZ/DDDDDDDDDD
1 FOO/ZZZ/YYY/AAAAAAAAAA
2 FOO/ZZZ/YYY/BBBBBBBBBB
3 FOO/ZZZ/YYY/CCCCCCCCCC
4 FOO/ZZZ/YYY/DDDDDDDDDD
I think that actual updating the ID column wouldn't be a good idea. Every time you add new groups of data you would have to run the update statement again. The better way would be creating a view and you will see desired output every time you query it.

rownum can be used as an incrementing ID?
UPDATE legacy_table
SET id = ROWNUM;
This will assign unique values to all records in the table. This link contains documentation about Oracle Pseudocolumn.

You can run the following:
update <table_name> set id = rownum where descr like 'FOO/BAR/BAZ/%'

This is pretty rough and I'm not sure if your RefNo is a single value column or you just made it like that for simplicity.
select
sub.RefNo
row_number() over (order by sub.RefNo) + (select max(id) from TABLE),
from (
select FOO+'/'+BAR+'/'+BAZ+'/'+OTHER as RefNo
from TABLE
group by FOO+'/'+BAR+'/'+BAZ+'/'+OTHER
) sub

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

To count a column based on another column's repeating(same) entry - sql

Related

Database Design Historical Data Model

Possible to use a column name in a UDF in SQL?

Oracle Sql: Obtain a Sum of a Group, if Subgroup condition met

SQL statement that allows me to output multiple columns as one row

SQL query update by grouping

Categories

Resources