after doing left join i am getting duplicate entries. Need to know how to remove duplicate entries

after doing left join i am getting duplicate entries. Need to know how to remove duplicate entries - qlikview

LOAD EmployeeID,
SickLeaveHours,
(8760-SickLeaveHours-VacationHours)as QualityTimeHours,
(8760-SickLeaveHours-VacationHours)/24 as QualityDays,
((8760-SickLeaveHours-VacationHours)/24)/30 as QualityMonths,
VacationHours;
SQL SELECT EmployeeID,
SickLeaveHours,
VacationHours
FROM Database;
Join
LOAD * INLINE [
F1, F2
ShiftID, Shift
1, DAY
2, EVENING
3, NIGHT
];
left join(fact)
b:
LOAD
AddressID,
EmployeeID;
SQL SELECT
AddressID,
EmployeeID
FROM Database;
left join(fact)
c:
LOAD
DepartmentID,
EmployeeID;
SQL SELECT
DepartmentID,
EmployeeID
FROM Database;
left join(fact)
LOAD ShiftID;
SQL SELECT ShiftID
FROM Database;
left Join (fact)
d:
LOAD EmployeeID,
Rate;
SQL SELECT
EmployeeID,
Rate
FROM Database ;
empDetails:
LOAD BirthDate,
EmployeeID,
Gender,
Title;
SQL SELECT BirthDate,
EmployeeID,
Gender,
Title
FROM Database ;
Department:
LOAD DepartmentID,
GroupName,
Name;
SQL SELECT
DepartmentID,
GroupName,
Name
FROM Database;
Address:
LOAD AddressID,
ModifiedDate,
rowguid;
SQL SELECT
AddressID,
ModifiedDate,
rowguid
FROM Database;
shift:
LOAD EndTime,
Name as name,
ShiftID,
StartTime;
SQL SELECT
EndTime,
Name,
ShiftID,
StartTime
FROM Database;
Expected table with no duplicate entries

You are getting duplicates because almost all tables are joined to the fact table and some of these tables dont have common keys which leads to cross join. For example:
left join(fact)
LOAD ShiftID;
SQL SELECT ShiftID
FROM Database;
The script above loads only one field ShiftID which do not exists in the fact table and the join will basically perform cross join (all-to-all)
Qlik joins tables on common field names. If you need to join two tables then these tables should have at least one common field. In your example: fact and EmpAddress tables will be joined/linked on EmployeeID field.
Another point: dont try and always join tables into one. Sometimes is better to just link them. Otherwise you can get wrong/duplicated answers.
For example: fact table can have multiple rows per EmployeeID and if you join (not link) EmpDetails table to fact and then count the Gender field you will get wrong/duplicated answer. In one to many case just link the tables (both tables should have common field(s) but they are not joined. Qlik will automatically link them)
The script below is a version of yours without the "hard" joins to fact (dont think you need these joins)
And also - the shifts data is not going to be linked to anything since it doesent have any common field with the employee tables
fact:
LOAD
EmployeeID,
SickLeaveHours,
(8760-SickLeaveHours-VacationHours)as QualityTimeHours,
(8760-SickLeaveHours-VacationHours)/24 as QualityDays,
((8760-SickLeaveHours-VacationHours)/24)/30 as QualityMonths,
VacationHours;
SQL SELECT EmployeeID,
SickLeaveHours,
VacationHours
FROM Database;
EmpAddress:
LOAD
EmployeeID,
AddressID;
SQL SELECT
AddressID,
EmployeeID
FROM Database;
EmpDepartment:
LOAD
EmployeeID,
DepartmentID;
SQL SELECT
DepartmentID,
EmployeeID
FROM Database;
EmpRate:
LOAD EmployeeID,
Rate;
SQL SELECT
EmployeeID,
Rate
FROM Database;
EmpDetails:
LOAD
EmployeeID,
BirthDate,
Gender,
Title;
SQL SELECT BirthDate,
EmployeeID,
Gender,
Title
FROM Database ;
Department:
LOAD
DepartmentID,
GroupName,
Name as DeprtmentName;
SQL SELECT
DepartmentID,
GroupName,
Name
FROM Database;
Address:
LOAD AddressID,
ModifiedDate as Address_ModifiedDate,
rowguid;
SQL SELECT
AddressID,
ModifiedDate,
rowguid
FROM Database;
// This part of the script will be isolated from the rest
// because there is no key to join on
shift:
LOAD
Name as ShiftName,
ShiftID,
EndTime,
StartTime;
SQL SELECT
EndTime,
Name,
ShiftID,
StartTime
FROM Database;
Join
LOAD * INLINE [
ShiftID, Shift
1, DAY
2, EVENING
3, NIGHT
];
// Is this table needed at all?
LOAD ShiftID;
SQL SELECT ShiftID
FROM Database;

Related

CONNECT BY Function not working with column of DATE datatype?

I am trying to build a table of another table by using the connect by prior and connect by root function (ORACLE) so that I have a table that shows all employees and their manager but also the manager of the manager to the ceo (so if the employee has 5 managers above him in the hierarchy, there are 5 rows for this employee). As part of a larger cte, it works fine. Now however, I want to include a date column from the base table.
The base table kinda looks like this:
employeeID
employeeName
managerID
managerName
dateColumn
12345
Miller
45454
Hawkins
21/02/2021
Now I am creating a new table out of this base table:
SELECT distinct employeeID, employeeName, managerName, CONNECT_BY_ROOT managerID as managerID
FROM basetable
CONNECT BY PRIOR employeeID = managerID
Now this works perfectly fine and I get the results I expected (load is < 1 second).
HOWEVER, when I include dateColumn (Datatype: DATE) inside the select, It will not stop loading (I waited 40 minutes), why is this the case?
Edit:
As requested by MT(), a few more details:
This is the CTE I am trying to use. Without dateColumn, it is working fine.
insert into targettable(EMP_ID, EMP_FORENAME, EMP_SURNAME, MGR_SURNAME, MGR_ID, date_Column)
with employees as (
select employeeID,
employeeForename,
employeeSurname,
managerName,
managerID,
trunc(dateColumn) as dateColumn
from basetable
where employeeSurname is not null
),
hierarchy as (
SELECT distinct employeeID,
employeeForename,
employeeSurname,
managerName,
CONNECT_BY_ROOT managerID as managerID,
trunc(dateColumn) as dateColumn
FROM employees e1
CONNECT BY PRIOR employeeID = managerID
),
base as (
select distinct e1.employeeID, e1.employeeForename, e1.employeeSurname, e2.employeeForename || ' ' || e2.employeeSurname managerName, e1.managerID, trunc(e1.dateColumn) as dateColumn
from hierarchy e1
left join employees e2 on e1.managerID = e2.employeeID)
select *
from base
where managerID is not null;

One reason may be that, in Oracle, a DATE data type is a binary data-type that consists of 7 bytes representing century, year-of-century, month, day, hour, minute and second. it ALWAYS has those 7 components and it is NEVER stored in any particular human-readable format.
When you are displaying the results, your client application appears to be defaulting to only display the century through day components and is not displaying the hour through second components; however those components still exist.
Therefore, when you do:
SELECT distinct
employeeID,
employeeName,
managerName,
CONNECT_BY_ROOT managerID as managerID,
dateColumn
FROM basetable
CONNECT BY PRIOR employeeID = managerID
You are getting the DISTINCT values down to the precision of a second in dateColumn but are only displaying the values to the precision of the day. This means that you are likely going to be returning a much larger data-set than you intend and the performance issues are possibly because rather than loading 100 rows for unique employees and days, instead, you are loading 100,000,000 rows for unique employees and seconds and that is going to take much more time.
You can try TRUNCating the date back to midnight:
SELECT distinct
employeeID,
employeeName,
managerName,
CONNECT_BY_ROOT managerID as managerID,
TRUNC(dateColumn) AS dateColumn
FROM basetable
CONNECT BY PRIOR employeeID = managerID

Creating table to count number of transaction done by each position

I have an employee table with columns
employee_ID, employee_name, employee_DOB, emp_Email, Emp_Phone, Emp_Position
and a transaction table with columns
transaction_ID, employee_ID, distribution_ID, Invoice_Number, transaction_Date
I want to show these columns using a SQL query:
Emp_Position, Transaction_done
Transaction_done is a number consisting of how many transactions employees in each position has done.
I've tried to use select count(transaction_ID) but it didn't show a correct count.

Try this:
SELECT Emp_Position
,count(transaction_ID)
FROM employee e
INNER JOIN [transaction] t
ON e.[employee_ID] = t.[employee_ID]
GROUP BY Emp_Position;

to create a table from multiple tables without affecting datawarehouse

I want to create a table in qlikview from multiple tables without affecting the datawarhouse. Also synthetic keys are found and how to remove them.
empLeave:
LOAD SickLeaveHours ,
VacationHours ;
SQL SELECT SickLeaveHours,
VacationHours
FROM databaseConnection;
empDetails:
LOAD BirthDate,
EmployeeID ,
Gender,
Title;
SQL SELECT BirthDate,
EmployeeID,
Gender,
Title
FROM databaseConnection;
empAddress:
LOAD AddressID,
ModifiedDate;
SQL SELECT AddressID,
ModifiedDate
FROM databaseConnection;
empDepartment:
LOAD DepartmentID,
GroupName,
Name;
SQL SELECT DepartmentID,
GroupName,
Name
FROMdatabaseConnection;
empRate:
LOAD PayFrequency,
Rate;
SQL SELECT PayFrequency,
Rate
FROM databaseConnection;
empshift:
LOAD EndTime,
Name as name,
ShiftID,
StartTime;
SQL SELECT EndTime,
Name,
ShiftID,
StartTime
FROM databaseConnection;
//need to make newtable with EmployeeiD,ShiftID,Vaccationhours, SickleaveHours,departmentID
Required to make table named as newtable with measures of
EmployeeID
ShiftID
Rate
SickLeaveHours
VacationHours
DepartmentID

Insert record into table using default values and values selected from another table using where clause

I know it is possible to insert records into a table by using a select statement on a different table, but I need to use a where clause to select which record. For example,
INSERT INTO Employee_Archive(EmployeeID, Name, ArchiveReason)
SELECT EmployeeID FROM Employees, Name from Employees, 'Retired'
WHERE EmployeeID = '001'
I hope that example makes sense. I wish to get the EmployeeID and the Name from the Employees table, and add my own ArchiveReason value, but I need to specify by which EmployeeID. Cheers

You can simply add a WHERE clause in your SELECT statement:
SELECT
EmployeeID,
Name,
'Retired'
FROM Employees
WHERE EmployeeID = '001'

Sql Server In statement with Null values

I have the following:
select Firstname, LastName, Department
from Employee
where
Department is null
or Department in (
select Department
from Employee
group by Department
having COUNT(1) < 5)
This takes a /long/ time (over 10 minutes) and I'm assuming it's eventually time out.
I can change it to:
select Firstname, LastName, Department
from Employee
where
coalesce(Department,'') in (
select coalesce(Department,'')
from Employee
group by Department
having COUNT(1) < 5)
which, in this case, gives me what I need. It returns in 2 seconds.
If I just doing the first query with either part of the where clause returns quick. I also unioned them and it return quickly as well. Any insight on why it's freaking out when i combine them?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

after doing left join i am getting duplicate entries. Need to know how to remove duplicate entries - qlikview

Related

CONNECT BY Function not working with column of DATE datatype?

Creating table to count number of transaction done by each position

to create a table from multiple tables without affecting datawarehouse

Insert record into table using default values and values selected from another table using where clause

Sql Server In statement with Null values

Categories

Resources