SQL deleting records with group by multiple tables - sql

I am trying to delete duplicate records in a table but on if they are duplicate per a record from another.
The following query gets me the number of duplicate records per 'bodyshop'.
Im trying to delete multiple invoices for each bodyshop.
SELECT
inv.InvoiceNo, job.BodyshopId, COUNT(*)
FROM
[Test].[dbo].[Invoices] as inv
join [Test].[dbo].Repairs as rep on rep.Id = inv.RepairId
join [Test].[dbo].Jobs as job on job.Id = rep.JobsId
GROUP BY
inv.InvoiceNo, job.BodyshopId
HAVING
COUNT(*) > 1
I want the duplicate invoice numbers per bodyshop to be deleted but i do want the original one to remain.
InvoiceNo BodyshopId (No column name)
29737 16 2
29987 16 3
30059 16 2
23491 139 2
23608 139 3
23867 139 4
23952 139 3
I only want invoice number 29737 to be once against bodyshopid 16 etc.
Hope that makes sense
Thanks

Perhaps this :
with cte as (
SELECT
inv.ID, inv.InvoiceNo, job.BodyshopId, rn = row_number() over (partition by inv.InvoiceNo, job.BodyshopId order by inv.InvoiceNo, job.BodyshopId)
FROM
[Test].[dbo].[Invoices] as inv
join [Test].[dbo].Repairs as rep on rep.Id = inv.RepairId
join [Test].[dbo].Jobs as job on job.Id = rep.JobsId
)
delete t1
from [Test].[dbo].[Invoices] t1 inner join cte t2 on t1.ID = t2.ID
where t2.rn > 1
Edit 1 - Your comments are trues. So a solution is to add an identity column to the invoice table. I've adapt my query.
To add / remove an identity column :
alter table [Test].[dbo].[Invoices] id int identity(1,1)
drop column id

You may run the following as two records are same so, Group by will return single row for same invoice:
DELETE FROM inv where id not in (
SELECT Max(inv.id) FROM (
SELECT
inv.id, inv.InvoiceNo, job.BodyshopId, COUNT(*)
FROM
[Test].[dbo].[Invoices] as inv
join [Test].[dbo].Repairs as rep on rep.Id = inv.RepairId
join [Test].[dbo].Jobs as job on job.Id = rep.JobsId
GROUP BY
inv.InvoiceNo, job.BodyshopId
HAVING
COUNT(*) > 1
) TMP_TABLE )
id is the primary key.
General SQL. Modify if needed for sql-server.

Related

Optimize a complex PostgreSQL Query

I am attempting to make a complex SQL join on several tables: as shown below. I have included an image of the dB schema also.
Consider table_1 -
e_id name
1 a
2 b
3 c
4 d
and table_2 -
e_id date
1 1/1/2019
1 1/1/2020
2 2/1/2019
4 2/1/2019
The issue here is performance. From the tables 2 - 4 we only want the most recent entry for a given e_id but because these tables contain historical data (~ >3.5M rows) it's quite slow. I've attached an example of how we're currently trying to achieve this but it only includes one join of 'table_1' with 'table_x'. We group by e_id and get the max date for it. The other way we've thought about doing this is creating a Materialized View and pulling data from that and refreshing it after some period of time. Any improvements welcome.
from fds.region as rg
inner join (
select e_id, name, p_id
from fds.table_1
where sec_type = 'S' AND active_flag = 1
) as table_1 on table_1.e_id = rg.e_id
inner join fds.table_2 table_2 on table_2.e_id = rg.e_id
inner join fds.sec sec on sec.p_id = table_1.p_id
inner join fds.entity ent on ent.int_entity_id = sec.int_entity_id
inner join (
SELECT int_1.e_id, int_1.date, int_1.int_price
FROM fds.table_4 int_1
INNER JOIN (
SELECT e_id, MAX(date) date
FROM fds.table_2
GROUP BY e_id
) int_2 ON int_1.e_id = int_2.fsym_id AND int_1.date = int_2.date
) as table_4 on table_4.e_id = rg.e_id
where rg.region_str like '%US' and ent.sec_type = 'P'
order by table_2.int_price
limit 500;
You can simplify this logic:
(
SELECT int_1.e_id, int_1.date, int_1.int_price
FROM fds.table_4 int_1
INNER JOIN (
SELECT e_id, MAX(date) date
FROM fds.table_2
GROUP BY e_id
) int_2 ON int_1.e_id = int_2.fsym_id AND int_1.date = int_2.date
) as table_4
To:
(SELECT DISTINCT ON (int_1.e_id) int_1.*
FROM fds.table_4 int_1
ORDER BY int_1.e_id, int_1.date DESC
) table_4
This can take advantage of an index on fds.table_4(e_id, date desc) -- and might be wicked fast with such an index.
You also want appropriate indexes for the joins and filtering. However, it is hard to be more specific without an execution plan.

Combine rows from Mulitple tables into single table

I have one parent table Products with multiple child tables -Hoses,Steeltubes,ElectricCables,FiberOptics.
ProductId -Primary key field in Product table
ProductId- ForeignKey field in Hoses,Steeltubes,ElectricCables,FiberOptics.
Product table has 1 to many relationship with Child tables
I want to combine result of all tables .
For eg - Product P1 has PK field ProductId which is used in all child tables as FK.
If Hoses table has 4 record with ProductId 50 and Steeltubes table has 2 records with ProductId 50 when I perform left join then left join is doing cartesian product of records showing 8 record as result But it should be 4 records .
;with HOSESTEELCTE
as
(
select '' as ModeType, '' as FiberOpticQty , '' as NumberFibers, '' as FiberLength, '' as CableType , '' as Conductorsize , '' as Voltage,'' as ElecticCableLength , s.TubeMaterial , s.TubeQty, s.TubeID , s.WallThickness , s.DWP ,s.Length as SteelLength , h.HoseSeries, h.HoseLength ,h.ProductId
from Hoses h
left join
(
--'' as HoseSeries,'' as HoseLength ,
select TubeMaterial , TubeQty, TubeID , WallThickness , DWP , Length,ProductId from SteelTubes
) s on (s.ProductId = h.ProductId)
) select * from HOSESTEELCTE
Assuming there are no relationships between child tables and you simply want a list of all child entities which make up a product you could generate a cte which has a number of rows which are equal to the largest number of entries across all the child tables for a product. In the example below I have used a dates table to simplify the example.
so for this data
create table products(pid int);
insert into products values
(1),(2);
create table hoses (pid int,descr varchar(2));
insert into hoses values (1,'h1'),(1,'h2'),(1,'h3'),(1,'h4');
create table steeltubes (pid int,descr varchar(2));
insert into steeltubes values (1,'t1'),(1,'t2');
create table electriccables(pid int,descr varchar(2));
truncate table electriccables
insert into electriccables values (1,'e1'),(1,'e2'),(1,'e3'),(2,'e1');
this cte
;with cte as
(select row_number() over(partition by p.pid order by datekey) rn, p.pid
from dimdate, products p
where datekey < 20050105)
select * from cte
create a cartesian join (one of the rare ocassions where an implicit join helps) pid to rn
result
rn pid
-------------------- -----------
1 1
2 1
3 1
4 1
1 2
2 2
3 2
4 2
And if we add the child tables
;with cte as
(select row_number() over(partition by p.pid order by datekey) rn, p.pid
from dimdate, products p
where datekey < 20050106)
select c.pid,h.descr hoses,s.descr steeltubes,e.descr electriccables from cte c
left join (select h.*, row_number() over(order by h.pid) rn from hoses h) h on h.rn = c.rn and h.pid = c.pid
left join (select s.*, row_number() over(order by s.pid) rn from steeltubes s) s on s.rn = c.rn and s.pid = c.pid
left join (select e.*, row_number() over(order by e.pid) rn from electriccables e) e on e.rn = c.rn and e.pid = c.pid
where h.rn is not null or s.rn is not null or e.rn is not null
order by c.pid,c.rn
we get this
pid hoses steeltubes electriccables
----------- ----- ---------- --------------
1 h1 t1 e1
1 h2 t2 e2
1 h3 NULL e3
1 h4 NULL NULL
2 NULL NULL e1
In fact, the result having 8 rows can be expected to be the result, since your four records are joined with the first record in the other table and then your four records are joined with the second record of the other table, making it 4 + 4 = 8.
The very fact that you expect 4 records to be in the result instead of 8 shows that you want to use some kind of grouping. You can group your inner query issued for SteelTubes by ProductId, but then you will need to use aggregate functions for the other columns. Since you have only explained the structure of the desired output, but not the semantics, I am not able with my current knowledge about your problem to determine what aggregations you need.
Once you find out the answer for the first table, you will be able to easily add the other tables into the selection as well, but in case of large data you might get some scaling problems, so you might want to have a table where you store these groups, maintain it when something changes and use it for these selections.

Row_Number() returning duplicate rows

This is my query,
SELECT top 100
UPPER(COALESCE(A.DESCR,C.FULL_NAME_ND)) AS DESCR,
COALESCE(A.STATE, (SELECT TOP 1 STATENAME
FROM M_STATEMASTER
WHERE COUNTRYCODE = B.CODE)) AS STATENAME,
COALESCE(A.STATECD, (SELECT TOP 1 CODE
FROM M_STATEMASTER
WHERE COUNTRYCODE = B.CODE)) AS STATECD,
COALESCE(A.COUNTRYCD, B.CODE) AS COUNTRYCODE
FROM
M_CITY A
JOIN
M_COUNTRYMASTER B ON A.COUNTRYCD = B.CODE
JOIN
[GEODATASOURCE-CITIES-FREE] C ON B.ALPHA2CODE = C.CC_FIPS
WHERE
EXISTS (SELECT 1
FROM [GEODATASOURCE-CITIES-FREE] Z
WHERE B.ALPHA2CODE=Z.CC_FIPS)
ORDER BY
A.CODE
Perfectly working fine, but when I'm trying to get the Row_number() over(order by a.code) I'm getting the duplicate column multiple time.
e.g
SELECT top 100
UPPER(COALESCE(A.DESCR,C.FULL_NAME_ND)) AS DESCR,
COALESCE(A.STATE, (SELECT TOP 1 STATENAME
FROM M_STATEMASTER
WHERE COUNTRYCODE = B.CODE)) AS STATENAME,
COALESCE(A.STATECD, (SELECT TOP 1 CODE
FROM M_STATEMASTER
WHERE COUNTRYCODE = B.CODE)) AS STATECD,
COALESCE(A.COUNTRYCD, B.CODE) AS COUNTRYCODE
ROW_NUMBER() OVER(ORDER BY A.CODE) AS RN -- i made a change here
FROM
M_CITY A
JOIN
M_COUNTRYMASTER B ON A.COUNTRYCD = B.CODE
JOIN
[GEODATASOURCE-CITIES-FREE] C ON B.ALPHA2CODE = C.CC_FIPS
WHERE
EXISTS (SELECT 1
FROM [GEODATASOURCE-CITIES-FREE] Z
WHERE B.ALPHA2CODE=Z.CC_FIPS)
ORDER BY
A.CODE
WHERE
EXISTS (SELECT 1
FROM [GEODATASOURCE-CITIES-FREE] Z
WHERE B.ALPHA2CODE = Z.CC_FIPS)
Another try, when I'm using ROW_NUMBER() OVER(ORDER BY newid()) AS RN it's taking logn time to execute.
Remember: CODE is the Pk of table M_CITY and there is no key in [GEODATASOURCE-CITIES-FREE] table.
Another thing: About JOIN(inner join), Join returns the matched Rows, right???
e.g:
table 1 with 20 rows,
table2 with 30 rows ,
table 3 with 30 rows
If I joined these 3 table on a certain key then the possibility of getting maximum rows is 20, am I right?
Your first query doesn't work fine. It just appears to. The reason is that you are using TOP without an ORDER BY, so an arbitrary set of 100 rows is returned.
When you add ROW_NUMBER(), the query plan changes . . . and the ordering of the result set changes as well. I would suggest that you fix the original query to use a stable sort.

Lookup value in second table

I have two tables.
Table Data
ID Item Kvartal
1 Payment 1
2 Salary 2
Table Kvartal
ID Kvartal_text Kvartal_nummer
1 Q1 1
2 Q2 2
I like to map Kvartal in table Data to Kvartal_text in table Kvartal by matching Kvartal in table Data with ID in table Kvartal. To get a result like Payment Q1; Salary Q2.
I have tried
SELECT * FROM Data
WHERE Data.Kvartal IN (SELECT Kvartal.Kvartal_text
FROM Kvartal
WHERE Kvartal.Kvartal_nummer = Data.Kvartal);
Simply JOIN the two tables and select the fields you want:
SELECT d.Item, k.Kvartal_text
FROM Data d
JOIN Kvartal k
ON k.ID = d.Kvartal
You can use MySQL Join operations for such tasks.
SELECT d.ID, d.Item, d.Kvartal, k.Kvartal_text FROM `Data` d
LEFT JOIN(
SELECT Kvartal_text, Kvartal_nummer FROM `Kvartal`
) AS k
ON k.Kvartal_nummer = d.Kvartal

Select count from different tables with a field in common

So here is my query:
SELECT COUNT( tab1.id_z ) AS Count, tab_tot.name
FROM tab1
INNER JOIN tab_id ON (tab1.id_key = tab_id.id_key)
INNER JOIN tab_tot ON (tab_id.id_z = tab_tot.id_z)
WHERE tab1.id_c = 10888 GROUP BY tab_id.id_z
In table tab_id there are 6 records with the same id_z but i receive 1 in Count.
How can I do?
Edit - this is my schema:
tab1
id_key | id_c
tab_id
id_key |id_z
tab_tot
id_z | description
Though I have more records in tab_id, I "count" ever 1