UNION DISTINCT query returning all Null values in various fields - sql

I am struggling to figure out how to account the for the NULLs that are occurring in my data. I'm pretty new to using unions. I know that order matters and the number of fields the SELECT statements need to match.
The results I have:
Client ID
Employee
appointment status
date of service
appointment date
1
HILL,PEGGY
NO SHOW
billing.srvDate
11/01/2022
2
HILL,PEGGY
SCHEDULED
billing.srvDate
11/07/202
5
HILL,PEGGY
SCHDEULED
billing.srvDate
11/10/2022
5
HILL,PEGGY
appointment.appointment_status
11/10/2022
apppointment.appdate
6
HILL,PEGGY
SCHEDULED
billing.srvDate
11/15/2022
2
HILL,PEGGY
appointment.appointment_status
11/07/2022
appoinment.appDate
The results I want:
Client ID
Employee
appointment status
date of service
appointment date
1
HILL,PEGGY
NO SHOW
11/01/2022
11/01/2022
2
HILL,PEGGY
SCHEDULED
11/07/2022
11/07/202
5
HILL,PEGGY
SCHDEULED
11/10/2022
11/10/2022
5
HILL,PEGGY
SCHEDULED
11/13/2022
11/13/2022
6
HILL,PEGGY
SCHEDULED
11/15/2022
11/15/2022
My current query:
select billing.clientID
, billing.client_name
, billing.employee_name
, 'appointment.appointment_value' status
, billing.start_time
, FORMAT_DATETIME("%Y-%m-%d",billing.date_of_service) srvDate
, 'FORMAT_DATETIME("%Y-%m-%d", appointment.appointment_date)' apptDate
from billing_table
left join demographicsTable demographics
on billing.ClientID = demographics.ClientID
and cast(billing.date_of_service as date) between '2022-11-01' and current_date()
union distinct
select clientID
, client
, employee
, appointment_status
, appointment_date
, 'billing.date_of_service' srvDate
, FORMAT_DATETIME("%Y-%m-%d", ad.appointment_date) apptDate
from appointment_table
and cast (appointment_date as date) between '2022-11-01' and current_date()
I do have an idea about why this is happening, but am looking for any input or direction for a solution.

I agree with commenters in case of naming mistaken naming so to summarize:
If you like to add row value defined with quotation marks use alias as, similarly to other columns with desired names.
eg.
'appointment.appointment_value' as status
Asure yourself that every column has finally same name like second one. If you do not do this probably name of columns will be defined by first table. But its better to have them same one to have control, either you could receive unexpected results.
Get rid off columns which you dont need
I also find out sth which might be useful for you.
UNION produces distinct rows—however, all column values in the row need to be distinct. If you wish to limit the distinction to a single or a few columns, when other columns are not distinct, you can wrap the UNION in a sub-query and GROUP BY the sub-query by the columns you wish to be unique.
from :
MYSQL UNION DISTINCT

Related

Calculate total working hours of employee based swipe in/ swipe out using oracel sql

I was recently given a task to calculate an employee's total office hours based on his card swipe in/swipe out. I have the following data :
id gate_1 gate_2 gate_3 gate_4
100 null null null 9:00
100 null 13:30 null null
100 null null 16:00 null
100 null null 18:00 null
Image
Here, the employee 100 comes in via gate_4 at 9 am and takes a break at 13:30 and goes out using gate_2. Then he comes back at 16:00 using gate_3 and leave office at 18:00 using gate_3. So, how to calculate the total in office timing using this data ?
Thanks in advance.
As has been pointed out your data model is denormalized to not even satisfy 1st normal form. The first step is to correct that (doing so in a query). Then there is no indication as to swipe in or swipe out, therefore it must be assumed that the first swipe time is always in and the ins/outs always alternate properly. Finally there is no indication of multiple days being covered so the assumption is just 1 period. That is a lot of assumptions.
Since an Oracle data type date contains time as well as the date and summing differences is much easier than with timestamps I convert timestamp to date in the first step of normalizing the data. Given all this we arrive at: (See Demo)
with normal (emp_id, inout_tm) as
( select emp_id, cast(gate1 as date)
from emp_gate_time
where gate1 is not null
union all
select emp_id, cast(gate2 as date)
from emp_gate_time
where gate2 is not null
union all
select emp_id, cast(gate3 as date)
from emp_gate_time
where gate3 is not null
union all
select emp_id, cast(gate4 as date)
from emp_gate_time
where gate4 is not null
)
select emp_id, round(24.0*(sum(hours)),1) hours_in_office
from ( select emp_id,(time_out - time_in) hours
from ( select emp_id, inout_tm time_in, rn
, lead(inout_tm) over(partition by emp_id order by inout_tm) time_out
from ( select n.*
, row_number() over(partition by emp_id order by inout_tm) rn
from normal n
)
)
where mod(rn,2) = 1
)
group by emp_id;
Items of Interest:
Subquery Factoring (CTE)
Date Arithmatic - in Hours ...Difference Between Dates in hours ...
Oracle Analytic Functions - Row_number, lead
You have a denormalized structure of your db scheme. You have fields as gate_1, gate_2 and etc. It's wrong way. The better way is following, you should have reference table of gates, for example like this
id|gate_name
--|---------
And your table with data for employee will be looks like this.
id_employee|id_gate|time
Then you can sort data in this table, and then count period of time between two consecutive rows.

trouble joining two date tables with consecutive dates starting at customer create date and ending at current date?

I am creating a customer activity by day table, which requires 9 CTEs.
The first table I want to cross join all customer unique IDs with the dates of a calendar table. So there will be multiple rows with the same unique ID for each day.
The problem is making sure the days are consecutive, regardless of the dates in the following CTEs.
This is a shortened example of what it would look like this:
GUID DATE CONDITIONS
1 3/13/2015 [NULL]
1 3/14/2015 Y
1 3/15/2015 [NULL]
....
1 9/2/2020 Y
2 4/15/2015 Y
2 4/16/2015 [NULL]
2 4/17.2015 [NULL]
2 4/18/2015 Y
...
2 9/2/2020 [NULL]
And so on - so that each customers has consecutive dates with their GUID, beginning with the creation date of their account (i.e. 3/13/2015) and ending on the current date.
the create date is on Table 1 with the unique ID, and I'm joining it with a date table.
My problem is that I can't get the query to run with a minimum create date per unique ID. Because if I don't create a minimum start date, the query runs forever (it's trying to create every unique ID for every consecutive date, even before the customer account was created.)
This is the code I have now.
Can anyone tell me if I have made the min. create date right? It's still just timing out when I run the query.
with
cte_carrier_guid (carrier_guid, email, date, carrier_id) as
(
SELECT
guid as carrier_guid
,mc.email
,dt2.date as date
,mc.id as carrier_id
FROM ctms_db_public.msd_carrier mc
CROSS JOIN public.dim_calendar dt2
WHERE dt2.date <= CURRENT_DATE
AND mc.created_at >= dt2.date
GROUP BY guid, mc.id, dt2."date", mc.email
ORDER BY guid, dt2.date asc
)
Select top 10 * from cte_carrier_guid
Here:
dt2.date <= CURRENT_DATE AND mc.created_at >= dt2.date
Since you want dates between the creation date of the user and today, you probably want the inequality condition on the creation date the other way around. I find it easier to follow when we put the lower bound first:
dt2.date >= mc.created_at AND dt2.date <= CURRENT_DATE
Other things about the query:
You want an INNER JOIN in essence, so use that instead of CROSS JOIN ... WHERE; it is clearer
ORDER BY in a cte makes no sense to me
Do you really need GROUP BY? The columns in the SELECT clause are the same as in the GROUP BY, so all this does is remove potential duplicates (but why would there be duplicates?)
You could probably phrase the cte as:
SELECT ...
FROM ctms_db_public.msd_carrier mc
INNER JOIN public.dim_calendar dt2 ON dt2.date >= mc.created_at
WHERE dt2.date <= CURRENT_DATE

Create SUM of field values from 2 Tables but choose most recent date (from either table)

I have two tables with the same types of columns: host, frequency, and date.
The host is the primary key.
I want it so that I can combine a table from say March with April and sum up their frequencies.
Here is what I should expect in my final output.
Let X be some primary key value host.
If X only exists in one table, use that row in the final result.
If X exists in both tables. Sum up both row's freq and select the most recent date. So if we had to compare between 4/20/19 vs 4/2/19, then 4/20/19 should be the date selected.
Suppose I had the following tables:
Table: Report_4.1.19
host freq date
A 15 4/1/2019
C 30 4/1/2019
Table: Report_3.1.19
host freq date
A 10 3/1/2019
B 20 3/1/2019
My ideal output should be the following:
Table: Result
host sum(freq) date
A 25 4/1/2019
B 20 3/1/2019
C 30 4/1/2019
Here's what I tried so far:
SELECT host,sum(freq),date
from
(
select
host,
freq,
date
from Report_4.1.19
union
select
host,
freq,
date
from Report_3.1.19
)
group by host
While my code does appear to achieve the intended result, I'm not sure if I've properly accounted for the date selection. What can I do to modify my code (if needed)?
Step one: use a date format supported by sqlite date and time functions, as those can be meaningfully ordered and compared, unlike what you have. YYYY-MM-DD works.
Step two: use one table for all reports (you already track the date as a column to tell them apart!) to keep things simple and play to the strengths of relational databases. Make the PK host, date instead of just host.
Step three: it then becomes a simple query:
SELECT host, sum(freq), max(date) AS date
FROM reports
GROUP BY host
ORDER BY host;
Your query is almost correct. Just use the max of the date column from your union.
SELECT host,sum(freq),max(date)
from
(
select
host,
freq,
date
from Report_4.1.19
union
select
host,
freq,
date
from Report_3.1.19
)
group by host
Note: If you wanted daily sums, you would use date directly and then add date to your group by clause.

Database Table Design for Group Values with Changing Status over Time

I have the following groups that have a particular designation depending on the date:
Group 1: 3/30/2017 to present: status 'on'
Group 2: 3/30/2017 to present: status 'on'
Group 3: 3/30/2017 to present: status 'on'
Group 4: 3/30/2017 to 6/1/2017: status 'off'; 6/2/2017 to present: status: 'on'
Group 5: 3/30/2017 to present: status 'off'
Group 6: 3/30/2017 to 7/10/2017: status 'off'; 7/11/2017 to present: status 'on'
I'm trying to translate this information into an effective database table so I can designate a change in status on a particular date.
I have a process that runs daily in near real time that checks the status of each group and undertakes various processes based on the status.
I have come up with the following though I think it is not sufficient:
Group Effective Date Termination Date Status
Group 1 '2017-03-30' NULL On
Group 2 '2017-03-30' NULL On
Group 3 '2017-03-30' NULL On
Group 4 '2017-03-30' '2017-06-01' On
Group 4 '2017-06-02' NULL Off
Group 5 '2017-03-30' NULL Off
Group 6 '2017-03-30' '2017-07-10' Off
Group 6 '2017-07-11' NULL On
So if I run my daily process historically, I want it to be able to consult the table and determine the status of the group. If I am running my process in real time, I want to be able to consult the table and determine the status. If I want to change the status at a particular point in time, I enter a termination date for the Group and status and start a new line.
I can't imagine this is a good way to do this.
Looking for insights.
Thanks in advance.
Here is one method using what I call Version Normal Form (vnf). It works for entities that have a smooth, unbroken chain of state changes. That is, there are no gaps (one state ends only upon another state taking effect) or overlaps (only one state is in effect at any time).
Design the Group table with all the group info except for status.
create table Group(
ID int auto generated primary key,
... ... -- all other Group data
);
Now create a Status table with a State field and one date field -- the date the status takes effect.
create table GroupStatus(
ID int references Group( ID ),
EffDate date not null default Now(),
State char( 1 ) check (State in ('Y', 'N')),
constraint PK_GroupStatus primary key( ID, EffDate )
);
There are two important points about the GroupStatus table to consider:
the PK definition means no two entries for the same Group can be defined for the same time. Thus, it is not possible to have overlapping status values.
there is no "end" date. A status takes effect at the designated date and time and continues in effect until replaced by another state change. Thus, it is not possible to have gaps between the status changes of any Group.
I used a single character 'Y' (for On) and 'N' (for Off) but you can define the status state any way you want. This is for illustration only.
The EffDate field may have to be Date, Datetime or Timestamp type, depending on your specific DBMS. Now() just means "current date and time" using any method available in your DBMS.
The GroupStatus data would look like this:
ID EffDate State
1 '2017-03-30' Y
2 '2017-03-30' Y
3 '2017-03-30' Y
4 '2017-03-30' Y
4 '2017-06-02' N
5 '2017-03-30' N
6 '2017-03-30' N
6 '2017-07-11' Y
For the level of data integrity enforced, the design is very simple. The queries will be a little more complicated.
To see the current status of Group 1:
select g.ID as 'Group', s.EffDate as 'Effective Date',
case s.State when 'Y' then 'On' else 'Off' end as Status
from Group g
join GroupStatus s
on s.ID = g.ID
and s.EffDate =(
select Max( s1.EffDate )
from GroupStatus s1
where s1.ID = g.ID
and s1.EffDate <= Now()
)
where g.ID = 1;
To see the current status of all groups, just omit the where clause. To see the status of group 1 that was in effect on a certain date, just change the Now() in the subquery to a variable loaded with the date and time of interest.
In fact, set the query for current status of all groups as a view. Then your daily process can simply query:
select ID, Status from CurrentGroupStatus;
Since you know there can be no gaps or overlaps, you know there will be one and only one row for each group.
Suppose upon inserting the group 6 entry on March 30, you already know the date it will be turned on. You can go ahead and insert the GroupStatus entry with the future date (July 11) and the "current" queries and view will continue to show the correct status (Off) until that date arrives, at which point the ON status will start appearing.
Create "instead of" triggers on the view(s) to correctly work with the underlying tables and your apps don't even have to know the details of how the data is stored.
This gives you rock solid data integrity and a lot of flexibility in how you view and manipulate the data.

SQL finding no activity between dates

I am trying to find how many days that the company from EmployeeActivity Table using Postgres did not have any activity of an joining an employee or cutting employees. Null refer to they who still do activity inside the company meanwhile DateLeave refer to them leaving the company or not working anymore.
DateJoined DateLeave Name
................................
2012-06-20 NULL Terrence
2012-06-21 2013-06-23 Mady
2010-06-20 2012-06-24 Greg
2013-06-20 NULL Matt
my trials for this was
select EXTRACT(DAY FROM MAX(EmployeeActivity.DateJoined) - MIN(EmployeeActivity.DateLeave)
From EmployeeActivity
WHERE EmployeeActivity.DateLeave IS 'NULL'
However it shows wrong value, especially for longer table
Output Expectation:
My expectation for this output is to query the longest period of days that the company have no activity in assigning or firing Employee.
If I've understood correctly, the following should meet your needs:
SELECT
ActivityDate,
lag(ActivityDate) over (ORDER BY ActivityDate) as PreviousActivityDate,
Date_Part('day',ActivityDate - lag(ActivityDate) over (ORDER BY ActivityDate)) as Difference
FROM
(
select DateJoined as ActivityDate from EmployeeActivity
union
select coalesce(DateLeave,now()) from EmployeeActivity
) AllActivityDates
ORDER BY Difference DESC
LIMIT 1 OFFSET 1
The reason for the OFFSET 1 is because the earliest DateJoined doesn't have a previous row, and that one comes to the top, we're just skipping it.