DB design for Global and Specific - sql

I need some help, inputs on DB design practices to achieve the below requirement. The database will be PostgreSQL.
I have to design a holiday tracker with the below requirements.
Global Holiday for the whole country.
Holiday per state.
Ex :
December 25th: Holiday to the whole country. (All)
Let's Say, January 25th: Holiday for "MO, IL" states.
So when query holiday for state MO or IL it should return both December 25th and January 25th.
But when a query for other states (ex: TN) it should only return December 25th only.
Tried DB design.
With this design for find global holidays, I have use OR Condition.
Ex;
select * from holiday_master hm
where exists ( select 'x' from state_holiday sh
where sh.holiday_id = hm.id
and state_id in ("MO", "IL") )
or not exists ( select 'x' from state_holiday sh
where sh.holiday_id = hm.id )
One option is to add an entry for state_holiday with state id as All and change query as below.
select * from holiday_master hm
where exists ( select 'x' from state_holiday sh
where sh.holiday_id = hm.id
and state_id in ("MO", "IL", "All") )
Please provide your great inputs here.
Note: No of States can be increased from 1 to 1000.

I would normally steer clear of any solution that involved a deliberate cartesian join, but based on the fact that we are dealing with very low data volumes (states < 1,000 records, holidays <= 365 records), you could make use of Postgres' most excellent array capabilities.
create table holiday_master (
id integer not null,
holiday_date date not null,
description text,
national_holiday boolean not null,
states integer[],
constraint holiday_master_pk primary key (id)
)
And then a sample query that should yield all holidays for a given state would look like:
select
s.state, h.holiday_date, h.description
from
state_master s
cross join holiday_master h
where
s.state = 'MO' and
(h.national_holiday or s.id = any (h.states))

Related

Amazon SQL job interview question: customers who made 2+ purchases -- is it doable in DAX?

You have a simple table that has only two fields: CustomerID, DateOfPurchase. List all customers that made at least 2 purchases in any period of six months. You may assume the table has the data for the last 10 years. Also, there is no PK or unique value.
One possible solution for this question is as follows:
SELECT DISTINCT CustomerID
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.CustomerID = t1.CustomerID AND
t2.DateOfPurchase > t1.DateOfPurchase AND
t2.DateOfPurchase <= DATEADD(month, 6, t1.DateOfPurchase));
I was wondering if we can do something similar in DAX. For the sake of simplicity, let's assume everything is in one table and there is no relationship.
Thanks
You got me at
For the sake of simplicity
Maybe this(?)
Table =
SUMMARIZE(
FILTER(
yourTable,
VAR CurrentCustomerID = yourTable[CustomerID]
VAR CurrentDateOfPurchase = yourTable[DateOfPurchase]
RETURN
NOT ISEMPTY(
CALCULATETABLE(
VALUES(yourTable[CustomerID]),
ALL(yourTable),
yourTable[CustomerID] = CurrentCustomerID,
yourTable[DateOfPurchase] < CurrentDateOfPurchase,
yourTable[DateOfPurchase] >= EDATE(CurrentDateOfPurchase,-6)
)
)
),
yourTable[CustomerID]
)
Update
Since there are no more answers, I am sharing the simulated data so that someone else will be encouraged to respond and we can all learn something from this. Around 50,000 customers, 100,000 transactions, 10 years. Date format: MDY
https://drive.google.com/file/d/1JPS4XHfpGSTXuNWIMdoPlp5Z1BqV9WGO/view?usp=sharing

SQL Looking up prior record information

Here is my situation. I have a series of claims for a person. We occasionally get a duplicate claim which is given a DUP error code and and denied with zero dollar amount. What I am trying to do is look up the original claim units and billed amount. If the duplicate and the original claim are the same units and billed amount I intend to ignore it (or at least label it as NOT a potential re-bill). If the units and/or the billed amount are different, that claim will be labeled as a potential re-bill.
I've got a function that correctly finds the original claim primary key value and the base query runs in less than a second. However, when I try to link that dataset back to the tables it crushes my run time to the point of uselessness. What I don't understand is why does the function alone run so quickly but attempting to link it back bogs it down so much, we are talking about a dataset of 140ish claims over a year of activity.
If anyone could offer some insight or has a better way to accomplish this I would be obliged.
SELECT pre.*
--, st.unitsofservice as OrigUnits
--I only include the line above if the link to the servicetransaction table on the last line of the query is active
FROM
(
SELECT Sub.*--,
,dbo.cf_DuplicateServiceSTID(sub.servicetransactionid) as PaidSvc
--the line above is the function returning the primary key value for the original claim
FROM
(
SELECT Pre.servicetransactionID,
st.servicedate, st.individualid, st.agencyid, st.servicecode, st.placeofserviceid, st.unitsofservice,
dbo.sortmodifiers(st.modifiercodeid, st.modifiercodeid2, st.modifiercodeid3, st.modifiercodeid4) as Modifiers,
bd.billedamount,
a.upi,
b.name
FROM (select pmt.servicetransactionid
from pmtadjdetail pmt
where substring(pmt.reasoncodes,1,5) = 'DUP') Pre
JOIN servicetransaction st on pre.servicetransactionid = st.servicetransactionid
join billdetail bd on st.servicetransactionid = bd.servicetransactionid
join agency a on st.agencyid = a.agencyid
join business b on a.businessid = b.businessid
where st.servicedate between #StartDate and #EndDate
and st.agencyid = iif(#AgencyID is null, st.agencyid, #AgencyID)
) Sub
join individual i on sub.individualid = i.individualid
join enrollment e on sub.individualid = e.individualid
WHERE e.enrollmenttype <> 'p'
) Pre
--join servicetransaction st on pre.paidsvc = st.servicetransactionid
--If I remove the comment from the line above my run time increases to the point of uselessness

LEFT JOIN include data

I have an application which handles school vacation. Unfortunately there are three kinds of different school vacations: Country wide, Federal State wide and City wide vacations. I store all the information in a table days, a table vacation_periods and a connection table slots:
days {
id:integer
date_value:date
}
slots {
id:integer
day_id:integer
vacation_period_id:integer
}
vacation_periods {
id:integer
starts_on:date
ends_on:date
name:string
country_id:integer
federal_state_id:integer
city_id:integer
}
I want to select all days within a specific time frame. Let's say Jan 1st of 2017 to Jan 31st of 2017. I can get those days with:
SELECT * FROM days WHERE date_value >= '2017-01-01' AND
date_value <= '2017-01-31';
But for my vacation calendar I don't just need the days but also the information which vacation_periods are within. Assuming I search for all vacation_periods which are in that time frame and which have
country_id == 1 or federal_state_id == 5 or city_id == 30
I've read about JOINS and LEFT JOINS which seem to be the solution to the problem. But I can't get everything together.
Is it possible to send one SQL request which returns all days within the requested time frame and the additional information if a vacation_period that fits the country_id == 1 or federal_state_id == 5 or city_id == 30 rule is connected via slots to each day. Including the name of that vacation_period?
If one request is not possible: Which is the quickest way to solve this within the database? How many requests? What kind of requests?
If possible I'd like to get a result in some kind of this form:
- date_value: "2017-01-01"
- date_value: "2017-01-02"
- date_value: "2017-01-03"
* vacation_period.id: 15
* vacation_period.name: "foobar"
- date_value: "2017-01-04"
* vacation_period.id: 15
* vacation_period.name: "foobar"
- date_value: "2017-01-05"
* vacation_period.id: 15
* vacation_period.name: "foobar"
- date_value: "2017-01-06"
- date_value: "2017-01-07"
...
The following query might give you the answer you are looking for:
SELECT * FROM days WHERE date_value >= '2017-01-01' AND date_value <='2017-01-31'
INNER JOIN slots ON days.id = slots.day_id
INNER JOIN vacation_periods ON vacation_periods.id = slots.vacation_period_id
I think you can get an unformatted version of what you want (that could be processed into a hierarchical output) with
CREATE TYPE vacation_authority AS ENUM
('COUNTY', 'FED-STATE', 'CITY');
/* not necessary, but cleans up the vacation_period table */
change to let vacation_period have only one id, and a new field authority of type vacation_authority. You can now make a primary key out of either the id field or (id, authority), depending on how the vacation data comes into the system.
SELECT date_value, vp.name, vp.id /* is the ID meaningful or arbitrary? */
FROM dates LEFT JOIN vacation_periods vp
WHERE date_value BETWEEN vp.starts_on AND vp.ends_on; -- inclusive range
Now if there are multiple holidays spanning a given date, this will be multiple records in the output. It's not clear what you want in this case.
None of the other answers was able to solve my problem but they let me to the solution so I'm grateful for them. Here's the solution:
SELECT days.date_value, slots.period_id, vacation_periods.name FROM days
LEFT OUTER JOIN slots ON (days.id = slots.day_id)
LEFT OUTER JOIN vacation_periods ON (slots.period_id = vacation_periods.id)
WHERE days.date_value >= '2017-01-05'
AND days.date_value <='2017-01-15'
AND (vacation_periods.id IS NULL
OR vacation_periods.country_id = 1
OR vacation_periods.federal_state_id = 5)
ORDER BY days.date_value;

Postgresql query for every day sold stock count

I have project on CRM which maintains product sales order for every organization.
I want to count everyday sold stock which I have managed to do by looping over by date but obviously it is a ridiculous method and taking more time and memory.
Please help me to find out it in single query. Is it possible?
Here is my database structure for your reference.
product : id (PK), name
organization : id (PK), name
sales_order : id (PK), product_id (FK), organization_id (FK), sold_stock, sold_date(epoch time)
Expected Output for selected month :
organization | product | day1_sold_stock | day2_sold_stock | ..... | day30_sold_stock
http://sqlfiddle.com/#!15/e1dc3/3
Create tablfunc :
CREATE EXTENSION IF NOT EXISTS tablefunc;
Query :
select "proId" as ProductId ,product_name as ProductName,organizationName as OrganizationName,
coalesce( "1-day",0) as "1-day" ,coalesce( "2-day",0) as "2-day" ,coalesce( "3-day",0) as "3-day" ,
coalesce( "4-day",0) as "4-day" ,coalesce( "5-day",0) as "5-day" ,coalesce( "6-day",0) as "6-day" ,
coalesce( "7-day",0) as "7-day" ,coalesce( "8-day",0) as "8-day" ,coalesce( "9-day",0) as "9-day" ,
coalesce("10-day",0) as "10-day" ,coalesce("11-day",0) as "11-day" ,coalesce("12-day",0) as "12-day" ,
coalesce("13-day",0) as "13-day" ,coalesce("14-day",0) as "14-day" ,coalesce("15-day",0) as"15-day" ,
coalesce("16-day",0) as "16-day" ,coalesce("17-day",0) as "17-day" ,coalesce("18-day",0) as "18-day" ,
coalesce("19-day",0) as "19-day" ,coalesce("20-day",0) as "20-day" ,coalesce("21-day",0) as"21-day" ,
coalesce("22-day",0) as "22-day" ,coalesce("23-day",0) as "23-day" ,coalesce("24-day",0) as "24-day" ,
coalesce("25-day",0) as "25-day" ,coalesce("26-day",0) as "26-day" ,coalesce("27-day",0) as"27-day" ,
coalesce("28-day",0) as "28-day" ,coalesce("29-day",0) as "29-day" ,coalesce("30-day",0) as "30-day" ,
coalesce("31-day",0) as"31-day"
from crosstab(
'select hist.product_id,pr.name,o.name,EXTRACT(day FROM TO_TIMESTAMP(hist.sold_date/1000)),sum(sold_stock)
from sales_order hist
left join product pr on pr.id = hist.product_id
left join organization o on o.id = hist.organization_id
where EXTRACT(MONTH FROM TO_TIMESTAMP(hist.sold_date/1000)) =5
and EXTRACT(YEAR FROM TO_TIMESTAMP(hist.sold_date/1000)) = 2017
group by hist.product_id,pr.name,EXTRACT(day FROM TO_TIMESTAMP(hist.sold_date/1000)),o.name
order by o.name,pr.name',
'select d from generate_series(1,31) d')
as ("proId" int ,product_name text,organizationName text,
"1-day" float,"2-day" float,"3-day" float,"4-day" float,"5-day" float,"6-day" float
,"7-day" float,"8-day" float,"9-day" float,"10-day" float,"11-day" float,"12-day" float,"13-day" float,"14-day" float,"15-day" float,"16-day" float,"17-day" float
,"18-day" float,"19-day" float,"20-day" float,"21-day" float,"22-day" float,"23-day" float,"24-day" float,"25-day" float,"26-day" float,"27-day" float,"28-day" float,
"29-day" float,"30-day" float,"31-day" float);
Please note, use PostgreSQL Crosstab Query. I have used coalesce for handling null values(Crosstab Query to show "0" when there is null data to return).
Following query will help to find the same:
select o.name,
p.name,
sum(case when extract (day from to_timestamp(sold_date))=1 then sold_stock else 0 end)day1_sold_stock,
sum(case when extract (day from to_timestamp(sold_date))=2 then sold_stock else 0 end)day2_sold_stock,
sum(case when extract (day from to_timestamp(sold_date))=3 then sold_stock else 0 end)day3_sold_stock,
from sales_order so,
organization o,
product p
where so.organization_id=o.id
and so.product_id=p.id
group by o.name,
p.name;
I just provided logic to find for 3 days, you can implement the same for rest of the days.
basically first do basic joins on id, and then check if each date(after converting epoch to timestamp and then extract day).
You have a few options here but it is important to understand the limitations first.
The big limitation is that the planner needs to know the record size before the planning stage, so this has to be explicitly defined, not dynamically defined. There are various ways of getting around this. At the end of the day, you are probably going to have somethign like Bavesh's answer, but there are some tools that may help.
Secondly, you may want to aggregate by date in a simple query joining the three tables and then pivot.
For the second approach, you could:
You could do a simple query and then pull the data into Excel or similar and create a pivot table there. This is probably the easiest solution.
You could use the tablefunc extension to create the crosstab for you.
Then we get to the first problem which is that if you are always doing 30 days, then it is easy if tedious. But if you want to do every day for a month, you run into the row length problem. Here what you can do is create a dynamic query in a function (pl/pgsql) and return a refcursor. In this case the actual planning takes place in the function and the planner doesn't need to worry about it on the outer level. Then you call FETCH on the output.

Comparing SQL Queries

I'm considering two SQL queries (Oracle) and I shall state the difference between them by showing examples. The queries are as follows:
/* Query 1 */
SELECT DISTINCT countryCode
FROM Member M
WHERE NOT EXISTS(
(SELECT organisation FROM Member
WHERE countryCode = 'US')
MINUS
(SELECT organisation FROM Member
WHERE countryCode = M.countryCode ) )
/* Query 2 */
SELECT DISTINCT M1.countryCode
FROM Member M1, Member M2
WHERE M2.countryCode = 'US' AND M1.organisation = M2.organisation
GROUP BY M1.countryCode
HAVING COUNT(M1.organisation) = (
SELECT COUNT(M3.organisation)
FROM Member M3 WHERE M3.countryCode = 'US' )
As far as I get it, these queries give back the countries which are members of the same organisations as the United States. The scheme of Member is (countryCode, organisation, type) with bold ones as primary key. Example: ('US', 'UN', 'member'). The member table contains only a few tuples and is not complete, so when executing (1) and (2) both yield the same result (e.g. Germany, since here only 'UN' and 'G7' are in the table).
So how can I show that these queries can actually return different results?
That means how can I create an example table instance of Member such that the queries yield different results based on that table instance?
Thanks for your time and effort.
The queries will result all the country codes which are members at least with all the organization the US is member with (it could be member with other organizations as well).
I've finally found an example to show that they can actually output different values based on the same Member instance. This is actually the case when Member contains duplicates. For query 1 this is not a problem, but for query 2 it actually affects the result, since here the number of memberships is crucial. So, if you have e.g. ('FR', 'UN', member) twice in Member the HAVING COUNT(M1.organisation) will return a different value as SELECT COUNT(M3.organisation) and 'FR' would not be part of the output.
Thanks to all for your constructive suggestions, that helped me a lot.
The first query would return countries whose list of memberships is longer than that of the US. It does require they include the same organizations as US but it could be more.
The second one requires the two membership lists to be identical.
As for creating an example with real data, start with an empty table and add this row:
insert into Member (countryCode, organisation)
values ('Elbonia', 'League of Fictitious Nations')
By the way a full outer join would let you characterize the difference symmetrically:
select
mo.countryCode || ' ' ||
case
when count(case when mu.organisation is null then 1 else null end) > 0
and count(case when mo.organisation is null then 1 else null end) > 0
then 'and US both have individual memberships they that do not have in common.'
when count(case when mo.organisation is null then 1 else null end) > 0
then 'is a member of some organisations that US is not a member of.'
when count(case when mo.organisation is null then 1 else null end) > 0
then 'is not a member of some organisations that US is a member of.'
else 'has identical membership as US.'
end
from
(select * from Member where countryCode = 'US') mu
full outer join
(select * from Member where countryCode = '??') mo
on mo.organisation = mu.organisation
Please forgive the dangling prepositions.
And a disk note, though duplicate rows are not allowed in normalized data, this query has no problem with those.