SQL required data based on date - sql

I have been working on a report for the required output. The scenario is that a block manufacturing firm having multiple orders of the same client delivers orders on a credit on different dates and clients pays amount partially irrespective of the orders. I have been stuck in these two tables:
Orders_master,
do_no Client_id Site_id Order_date Amount
1 1 1 2013-10-27 50000
2 1 1 2013-10-29 47000
3 1 1 2013-10-15 10000
Client_payments,
P_id Client_id Site_id P_date Amount
1 1 1 2013-11-05 30000
2 1 1 2013-11-10 67000
3 1 1 2013-11-20 10000
I need help to write a query which gives the following output all rows from both tables,
Do_no Client_id Site_id Order_date P_date Order_amount Payment_amount
1 1 1 2013-10-27 Null 50000 Null
2 1 1 2013-10-29 Null 47000 Null
Null 1 1 Null 2013-11-05 Null 30000
Null 1 1 null 2013-11-10 Null 67000
3 1 1 2013-11-15 Null 10000 Null
Null 1 1 Null 2013-11-20 Null 10000
Below query returns all the rows of orders_master table but misses the last row of the required output shows above,
select om.*, cp.*
from orders_master om left join
client_payment cp on
om.order_date = cp.p_date and
om.site_id = cp.site_id
where om.site_id = 1
I tried different joins but it does not return all the rows of both the columns, if returns then with repeating values and not nulls

It looks like you want to use UNION [ALL] to combine the two tables, rather than a JOIN:
SELECT do_no,
client_id,
site_id,
Order_Date,
P_Date = NULL,
Order_Amount = Amount,
Payment_Amount = NULL
FROM Orders_Master
WHERE Site_ID = 1
UNION ALL
SELECT do_no = NULL,
client_id,
site_id,
Order_Date = NULL,
P_Date = P_Date,
Order_Amount = NULL,
Payment_Amount = Amount
FROM Client_Payments
WHERE Site_ID = 1;
Example on SQL Fiddle

Related

Calculate sum and cumul by age group and by date (but people changes age group as time passes)

DBMS : postgreSQL
my problem :
In my database I have a person table with id and birth date, an events table that links a person, an event (id_event) and a date, an age table used for grouping ages. In the real database the person table is about 40 millions obs, and events 3 times bigger.
I need to produce a report (sum and cumul of X events) by age (age_group) and date (event_date). There isn't any problem to count the number of events by date. The problem lies with the cumul : contrary to other variables (sex for example), a person grow older and changes age group
as time passes, so for a given age group the cumul can increase then decrease. I want that the event's cumul, on every date in my report, uses the age of the persons on these dates.
Example of my inputs and desired output
The only way I found is to do a Cartesian product on the tables person and the dates v_dates, so it's easy to follow an event and make it change age_group. The code below uses this method.
BUT I can't use a cartesian product on my real data (makes a table way too big) and I need to use another method.
reproductible example
In this simplified example I want to produce a report by month from 2020-07-01 to 2022-07-01 (view v_dates). In reality I need to produce the same report by day but the logic remains the same.
My inputs
/* create table person*/
DROP TABLE IF EXISTS person;
CREATE TABLE person
(
person_id varchar(1),
person_birth_date date
);
INSERT INTO person
VALUES ('A', '2017-01-01'),
('B', '2016-07-01');
person_id
person_birth_date
A
2000-10-01
B
2010-02-01
/* create table events*/
DROP TABLE IF EXISTS events;
CREATE TABLE events
(
person_id varchar(1),
event_id integer,
event_date date
);
INSERT INTO events
VALUES ('A', 1, '2020-07-01'),
('A', 2, '2021-07-01'),
('B', 1, '2021-01-01'),
('B', 2, '2022-01-01');
person_id
event_id
event_date
A
1
2020-01-01
A
2
2021-01-01
B
1
2020-07-01
B
2
2021-01-01
/* create table age*/
DROP TABLE IF EXISTS age;
CREATE TABLE age
(
age integer,
age_group varchar(8)
);
INSERT INTO age
VALUES (0,'[0-4]'),
(1,'[0-4]'),
(2,'[0-4]'),
(3,'[0-4]'),
(4,'[0-4]'),
(5,'[5-9]'),
(6,'[5-9]'),
(7,'[5-9]'),
(8,'[5-9]'),
(9,'[5-9]');
/* create view dates : contains monthly dates from 2020-07-01 to 2022-07-01*/
CREATE or replace view v_dates AS
SELECT GENERATE_SERIES('2020-07-01'::date, '2022-07-01'::date, '6 month')::date as event_date;
age
age_group
0
[0-4]
1
[0-4]
5
[5-9]
My current method using a cartesian product
CROSS JOIN person * v_dates
with a LEFT JOIN to get info from table events
with a LEFT JOIN to get age_group from table age
CREATE or replace view v_person_event AS
SELECT
pdev.person_id,
pdev.event_date,
pdev.age,
ag.age_group,
pdev.event1,
pdev.event2
FROM
(
SELECT pd.person_id,
pd.event_date,
date_part('year', age(pd.event_date::TIMESTAMP, pd.person_birth_date::TIMESTAMP)) as age,
CASE WHEN ev.event_id = 1 THEN 1 else 0 END as event1,
CASE WHEN ev.event_id = 2 THEN 1 else 0 END as event2
FROM
(
SELECT *
FROM person
CROSS JOIN v_dates
) pd
LEFT JOIN events ev
on pd.person_id = ev.person_id
and pd.event_date = ev.event_date
) pdev
Left JOIN age as ag on pdev.age = ag.age
ORDER by pdev.person_id, pdev.event_date;
add columns event1_cum and event2_cum
CREATE or replace view v_person_event_cum AS
SELECT *,
SUM(event1) OVER (PARTITION BY person_id ORDER BY event_date) event1_cum,
SUM(event2) OVER (PARTITION BY person_id ORDER BY event_date) event2_cum
FROM v_person_event;
SELECT * FROM v_person_event_cum;
person_id
event_date
age
age_group
event1
event2
event1_cum
event2_cum
A
2020-07-01
3
[0-4]
1
0
1
0
A
2021-01-01
4
[0-4]
0
0
1
0
A
2021-07-01
4
[0-4]
0
1
1
1
A
2022-01-01
5
[5-9]
0
0
1
1
A
2022-07-01
5
[5-9]
0
0
1
1
B
2020-07-01
4
[0-4]
0
0
0
0
B
2021-01-01
4
[0-4]
1
0
1
0
B
2021-07-01
5
[5-9]
0
0
1
0
B
2022-01-01
5
[5-9]
0
1
1
1
B
2022-07-01
6
[5-9]
0
0
1
1
desired output : create a report grouped by variables age_group and event_date
SELECT
age_group,
event_date,
SUM(event1) as event1,
SUM(event2) as event2,
SUM(event1_cum) as event1_cum,
SUM(event2_cum) as event2_cum
FROM v_person_event_cum
GROUP BY age_group, event_date
ORDER BY age_group, event_date;
age_group
event_date
event1
event2
event1_cum
event2_cum
[0-4]
2020-07-01
1
0
1
0
[0-4]
2021-01-01
1
0
2
0
[0-4]
2021-07-01
0
1
1
1
[5-9]
2021-07-01
0
0
1
0
[5-9]
2022-01-01
0
1
2
2
This is why this is not an ordinary cumul : for the age_group [0-4], event1_cum goes from 2 at '2021-01-01' to 1 at '2021-07-01' because A was in [0-4] at the time of the event 1, still in [0-4] at '2021-01-01' but in [5-9] at 2021-07-01
When we read the report:
the 2021-01-01, there was 2 person between 0 and 4 (at that date) who had event1 and 0 person who had event2.
the 2021-07-01, there was 1 person between 0 and 4 who had event1 and 1 person who had event2.
I can't get a solution to this problem without using a cartesian Product...
Thanks in advance!

Aggregating columns inside a CASE statement

I have a case such that
~id ~from ~to ~label ~weight
100 A B knows 2
100 A B knows 3
100 A B knows 4
But I want only the weight for maximum Date.
How can I modify the below CASE statement such that only 1 entry is there for an ID.
Query:
(
select distinct
CASE WHEN *some-condition* as "~id"
,CASE *some-condition* as "~from"
,CASE *some-condition* as "~to"
,CASE *some-condition* as "~label"
,CASE ??? as "weight"
from
(select
dense_rank() over(partition by t.job_id order by start_time desc) rnk,
t.Date,
t.job_id,
t.start_time,
t.end_time,
t.dep_id,
t.table_name
.....
t.region_id,
from Table1 t
,Tabel2 J
where t.JOB_ID=J.JOB_ID
)
where rnk=1
order by JOB_ID,table_name
)
where "~id" is NOT NULL and "~label" is NOT NULL and "~from" is NOT NULL and "~to" is NOT NULL;
;
Table t
job_id Date table_name ....... dep_id weight
100 2020-10-20 abc 1 2
100 2020-10-20 abc 2 3
100 2020-10-20 abc 3 4
100 2020-10-20 abc 4 10
100 2020-10-19 abc 3 2
Output weight in the result should be corresponding to maximum dep_id.
~id ~from ~to ~label ~weight
100 A B knows 10
It's quite hard to come up with a solution since you didn't state how ~id, ~from, ~to, ~label are calculated. You should be able to achieve your desired output with window functions, i.e. FIRST_VALUE():
...
,CASE *some-condition* as "~label"
,FIRST_VALUE(weight)OVER(ORDER BY dep_id desc) "weight"
...
You may need to add a PARTITION BY clause depending if you want to have the first value overall or depending on some other conditions as well.

How to get Result 1 for all user_ids that at least one time have source as paid

How to get Result=1 for all user_ids that at least one time have source as paid. I mean not just for one row where source=paid, but for all rows for this user_id.
Result column does not exist in the table! We should get it somehow using the code!
Row Table
source session_number user_id
NULL 1 12345
NULL 2 12345
NULL 3 12345
NULL 4 12345
NULL 1 67890
paid 2 67890
NULL 3 67890
Desired Table
source session_number user_id result
NULL 1 12345 0
NULL 2 12345 0
NULL 3 12345 0
NULL 4 12345 0
NULL 1 67890 1
paid 2 67890 1
NULL 3 67890 1
You seem to want a window function. It would seem to be:
select t.*,
max(case when source = 'paid' then 1 else 0 end) over (partition by userid) as result
from t;
In Postgres, you can return a boolean as:
select t.*,
bool_or(source = 'paid') over (partition by userid) as result
from t;
use exists
select a.* from table_name a
where exists( select 1 from table_name b where a.userid=b.userid
and b.source='paid')
and result=1
With subquery
SELECT *,
CASE
WHEN user_id IN
(
SELECT user_id
FROM table_name
WHERE source = 'paid'
)
THEN 1
ELSE 0
END AS result
FROM table_name

How to remove duplicate records in SQL table based on certain conditions and multi-criteria?

The following table consists of the columns EmployeeID, JobNum, CompDate.
Basically there are 3 different employees that have certain job ids and their completed date time associated with them. There are some jobNum that have no association to a particular EmployeeID and may have a complete date.
Problem:
1) Remove the records for a EmployeeID when the Complete date is not null or is populated with date.
2) Delete the record that has null values for both columns JobNum and CompDate for an Employee WHEN there is a record for that EmployeeID that consists of an open job (when JobNum is NOT NULL and CompDate is NULL). THIS IS FOR DUPLICATES.
Tried using ranking function with case statements. Does not rank properly.
[JobNum],
[CompDate],
RANK ( ) OVER( PARTITION BY [EmployeeID] ORDER BY
CASE WHEN ([JobNum] is null AND [CompDate] is null) THEN 1
WHEN ([JobNum] is not null AND [CompDate] is null) THEN 2
WHEN ([JobNum] is not null AND [CompDate] is not null) THEN 3
END ASC) as Rank
FROM [dbo].test1
WHERE [EmployeeID] IN (SELECT [EmployeeID] FROM dbo.test1
GROUP BY [EmployeeID]
HAVING COUNT(*) > 1)
EmployeeID JobNum CompDate Rank
1 NULL NULL 1
1 401 NULL 2
1 435 NULL 2
1 358 2019-07-15 15:10:57.810 4
2 285 NULL 1
2 299 2019-07-15 15:14:04.603 2
2 305 2019-07-14 15:10:57.810 2
2 330 2019-06-13 10:10:30.710 2
3 NULL NULL 1
3 435 NULL 2
3 402 2019-07-11 13:10:47.610 3
Ex:
EmployeeID JobNum CompDate Rank
Delete this -> 1 NULL NULL 1
when this exists -> 1 401 NULL 2
when this exists -> 1 435 NULL 2
1 358 2019-07-15 15:10:57.810 4
You seem to want only rows where compdate is null and one of the following two conditions:
jobnum is null
jobnum is not null and no rows for the employee have jobnum as null
I'm not sure what rank() has to do with these filtering conditions:
select t.*
from test1 t
where t.compdate is not null and -- condition 1
(t.jobnum is null or
not exists (select 1
from test1 tt
where tt.employeeid = t.employeeid and
tt.compdate is null and
tt.jobnum is null
)
);

PostgreSQL backdating query

I am trying to write a query that will return counted records from the time they were created.
The primary key is a particular house which is unique. Another variable is bidder. The house to bidder relationship is 1:1 but there can be multiple records for each bidder (different houses). Another variable is a count (CASE) of results of previous bids that were won.
I want to be able to set the count to return the number of previous bids won at the time each house record was created.
Currently, my query (shown below) logs the overall number of previous bids won regardless of the time the house record was created. Any help would be great!
Example:
SELECT h.house_id,
h.bidder_id,
h.created_date,
b.previous_bids_won
FROM house h
LEFT JOIN bid_transactions b
ON h.house_id = b.house_id
LEFT JOIN (
SELECT bidder_id,
COUNT(CASE WHEN created_date IS NOT NULL AND transaction_kind = 'Successful Bid' THEN 1 END) previous_bids_won
FROM bid_transactions
GROUP BY user_id
) b
ON h.bidder_id = b.bidder_id
ORDER BY h.created_date DESC
Example Data:
house_id bidder_id created_date previous_bids_won
1 1 2016-03-21 0
2 2 2016-02-10 1
3 2 2016-01-15 1
4 3 2016-01-01 0
Desired Data:
house_id bidder_id created_date previous_bids_won
1 1 2016-03-21 0
2 2 2016-02-10 1
3 2 2016-01-15 0
4 3 2016-01-01 0
If I understand correctly, you just want a cumulative sum:
SELECT h.house_id, h.bidder_id, h.created_date,
SUM(CASE WHEN created_date IS NOT NULL AND transaction_kind = 'Successful Bid'
THEN 1
ELSE 0
END) OVER (PARTITION BY h.bidder_id ORDER BY h.created_date
) as previous_bids_won
FROM house h LEFT JOIN
bid_transactions b
ON h.house_id = b.house_id;