I have a problem when trying to add rows from a tempTable to a table. The problem is that it adds the rows from TempDealer table even if they're already in the Dealership table (Notice that I'm specifying in the WHERE Statement WHERE td.supplier_ref NOT IN (SELECT supplier_ref FROM #dealerStatus). Every time I run the stored procedure it adds again all rows from the TempDealer to the Dealership table when it should only add them once. Any ideas? thanks in advance.
INSERT INTO #dealerStatus (dealerId, supplier_ref, [add], [timestamp])
SELECT NULL, td.supplier_ref, 1, GETDATE()
FROM TempDealer td
WHERE td.supplier_ref NOT IN (SELECT supplier_ref FROM #dealerStatus)
INSERT INTO Dealership(
dealership_name,
telephone,
fax,
sales_email,
support_email,
service_mask,
address1,
address2,
town,
county,
postcode,
website,
date_modified,
supplier_ref,
dealer_type,
county_id,
town_id,
area_id,
district_id,
longitude,
latitude
)
SELECT DISTINCT
[updateSource].leasing_broker_name,
[updateSource].telephone,
[updateSource].fax_number,
[updateSource].email,
[updateSource].support_email,
[updateSource].service_mask,
[updateSource].address1,
[updateSource].address2,
[updateSource].town,
[updateSource].county,
[updateSource].post_code,
[updateSource].web_address,
GETDATE(),
[updateSource].supplier_ref,
1,
[updateSource].county_id,
[updateSource].town_id,
[updateSource].region,
[updateSource].district,
[updateSource].longitude,
[updateSource].latitude
FROM
#dealerStatus dealerUpdateStatus INNER JOIN
TempDealer [updateSource] ON dealerUpdateStatus.supplier_ref = updateSource.supplier_ref
WHERE
dealerUpdateStatus.[add] = 1
I sorted it out this way:
INSERT INTO #dealerStatus (dealerId, supplier_ref, [add], [timestamp])
SELECT NULL, td.supplier_ref, 1, GETDATE()
FROM TempDealer td
WHERE td.supplier_ref NOT IN (SELECT supplier_ref FROM Dealership WHERE dealership.supplier_ref IS NOT NULL and dealership.dealer_type = 1)
Related
I have a problem like this: I need to optimize the application, with db (postgreSQL), the table looks like this:
CREATE TABLE voter_count(
id SERIAL,
name VARCHAR NOT NULL,
birthDate DATE NOT NULL,
count INT NOT NULL,
PRIMARY KEY(id),
UNIQUE (name, birthDate))
I have more than a thousand such voters, and I need to put all of them in the database, but among them there are several duplicates that could vote several times (from 2 to infinity), and I need to, when meeting such a duplicate,
increase the count field for an existing one (for a voter with the same name and birthdate). Previously, I just checked whether there is such a voter in the table or not, and if there is, then find it and increase the count.
But the program worked for too long, and I tried to do it through MULTI INSERT and use ON CONFLICT DO UPDATE to increase count,
but I get an error, then I asked a question on stackoverflow, and I was offered to do a lot of INSERTs, through a loop, but in PostgreSQL.
INSERT INTO voter_count(name, birthdate, count)
VALUES
('Ivan', '1998-08-05', 1),
('Sergey', '1998-08-29', 1),
('Ivan', '1998-08-05', 1)
ON CONFLICT (name, birthdate) DO UPDATE SET count = (voter_count.count + 1)
Question: how to do INSERT in a loop through PostgreSQL.
Probably the best option is to insert before all the data in a table without primary key, for instance:
CREATE TABLE voter_count_with_duplicates(
name VARCHAR NOT NULL,
birthDate DATE NOT NULL)
and then insert the data with a single statement:
INSERT INTO voter_count (name, birthDate, count)
SELECT name, birthDate, COUNT(*)
FROM voter_count_with_duplicates
GROUP BY name, birthDate
Note that if you have the data in a structured text file (for instance a CSV file), you can insert all the data into voter_count_with_duplicates with a single COPY statement.
If you have to insert (a lot of) new data with the table already populated, there are several possibilities. One is to use the solution in the comment. Another one is to perform an an update and an insert:
WITH present_tuples AS
(SELECT name, birthDate, COUNT(*) AS num_of_new_votes
FROM voter_count_with_duplicates d JOIN voter_count c ON
v.name = d.name and v.birthDate = d.birthDate
GROUP BY name, birthDate)
UPDATE voter_count SET count = count + num_of_new_votes
FROM present_tuples
WHERE present_tuples.name = voter_count.name
AND present_tuples.birthDate = voter_count.birthDate;
WITH new_tuples AS
(SELECT name, birthDate, COUNT(*) AS votes
FROM voter_count_with_duplicates d
WHERE NOT EXISTS SELECT *
FROM voter_count c
WHERE v.name = d.name and v.birthDate = d.birthDate
GROUP BY name, birthDate)
INSERT INTO voter_count (name, birthDate, count)
SELECT name, birthDate, votes
FROM new_tuples;
What you want to achieve is colloquially called an upsert; insert the row, if it doesn't exist, else update. The operation to use for this is MERGE.
The data set you want to merge into the existing table is the aggregate of your values grouped by name and bithdate with their total sum you want to insert/add.
MERGE INTO voter_count vc
USING
(
SELECT name, birthdate, SUM(cnt) as total
FROM
(
VALUES
('Ivan', DATE '1998-08-05', 1),
('Sergey', DATE '1998-08-29', 1),
('Ivan', DATE '1998-08-05', 1)
) input_data (name, birthdate, cnt)
GROUP BY name, birthdate
) data ON (data.name = vc.name and data.birthdate = vc.birthdate)
when not matched
insert (name, birthdate, count) values (data.name, data.birthdate, data.total)
when matched
update set count = count + data.total;
I'm practising subquery in Snowflake (it supports common SQL queries). I stuck and cannot get the result I want.
Data
INSERT INTO members (memid, surname, firstname, address, zipcode, telephone, recommendedby, joindate) VALUES
(12, 'Baker', 'Anne', '55 Powdery Street, Boston', 80743, '844-076-5141', 9, '2012-08-10 14:23:22'),
(21, 'Mackenzie', 'Anna', '64 Perkington Lane, Reading', 64577, '(822) 661-2898', 1, '2012-08-26 09:32:05'),
(6, 'Tracy', 'Burton', '3 Tunisia Drive, Boston', 45678, '(822) 354-9973', NULL, '2012-07-15 08:52:55');
I want to get each member's name, member id, recommender's name and recommender's id.
My code
with recommender as (
select distinct concat(t1.firstname, ' ', t1.surname) recommender
, memid recommender_id
from "EXERCISES"."CD"."MEMBERS" t1
where exists (select surname from "EXERCISES"."CD"."MEMBERS" t2
where t1.memid = t2.recommendedby)
)
, member as (
select
distinct concat(firstname, ' ', surname) as member,
memid,
recommender,
recommender_id
from recommender
left join "EXERCISES"."CD"."MEMBERS" t3 on recommender.recommender_id = t3.recommendedby
) select * from member
order by member;
My output
Noticed that Burton Tracy is missing from the output because she doesn't have any recommender. I want to keep her data in the output. How should I rewrite my code?
Thank you
I'm not quite sure why you are using CTEs for this...? Or subqueries, for that matter.
Getting the person who recommeded a member is not more than a LEFT JOIN:
select
concat(m.firstname, ' ', m.surname) as member,
m.memid member_id,
concat(r.firstname, ' ', r.surname) as recommender
r.memid recommender_id
from
members m
left join members r on r.memid = m.recommendedby
I have a query where I'm trying to select some customer information: name, address, city, state, and zip. I'd like to pull all information and only pull of the records if there is a dupe.
Example of data:
Invoice_Date First Last Addr City State Zip
11/11/14 Jim Jones 12 Cedar alkdjf TN 29430
11/11/15 Ralph Jones 12 Cedar alkdjf TN 29430
11/11/14 Robert Smith 15 block slkjdd TX 10932
What I want to return:
Invoice_Date First Last Addr City State Zip
11/11/15 Ralph Jones 12 Cedar alkdjf TN 29430 (newest Record)
11/11/14 Robert Smith 15 block slkjdd TX 10932
This is my query that is able to pull ALL customers for the specified dates:
SELECT
Invoice_Tb.Invoice_Date, Invoice_Tb.Customer_First_Name,
Invoice_Tb.Customer_Last_Name,
Invoice_Tb.Customer_Address, Invoice_Tb.City,
Invoice_Tb.Customer_State, Invoice_Tb.ZIP_Code
FROM
Invoice_Tb
LEFT OUTER JOIN
Invoice_Detail_Tb ON Invoice_Tb.Store_Number = Invoice_Detail_Tb.Store_Number
AND Invoice_Tb.Invoice_Number = Invoice_Detail_Tb.Invoice_Number
AND Invoice_Tb.Invoice_Date = Invoice_Detail_Tb.Invoice_Date
WHERE
(Invoice_Tb.Invoice_Date IN ('11/11/14', '11/11/15'))
AND (Invoice_Detail_Tb.Invoice_Detail_Code = 'FSV')
AND (LEN(Invoice_Tb.Customer_Address) > 4)
ORDER BY
Invoice_Tb.Customer_Address
Now, obviously I can't use Row_Number here, because it's not an option in SQL Server 2000, so that's out.
I've tried Select Distinct - but I'm in need of the other information (first name, last name, etc), and when using select Distinct, it also returns distinct records for First and Last name. I only want 1 record, per address.
How can I return 1 row, for each distinct Address while including first name last name of the MOST recent visit, in this case - 11/11/15.
create table #SomeTest (Invoice_Number Int, InvoiceDt DateTime, FName Varchar(24), LName Varchar(24), Addr Varchar(24), City Varchar(24), St Varchar(2), Zip Varchar(12) )
insert into #SomeTest (Invoice_Number, InvoiceDt, FName, LName, Addr, City, St, Zip) values (1, '11/11/14', 'Jim','Jones', '12 Cedar', 'alkdjf', 'TN', '29430')
insert into #SomeTest (Invoice_Number, InvoiceDt, FName, LName, Addr, City, St, Zip) values (2, '11/11/15', 'Ralph','Jones', '12 Cedar', 'alkdjf', 'TN', '29430')
insert into #SomeTest (Invoice_Number, InvoiceDt, FName, LName, Addr, City, St, Zip) values (3, '11/11/14', 'Robert','Smith', '15 block', 'slkjdd', 'TX', '10932')
select * from #SomeTest
where Invoice_Number in
(
select Invoice_Number from
(select Invoice_Number = max(Invoice_Number), SupperAddy = Addr + '#' + City + '#' + St + '#' + Zip from #SomeTest
group by Addr + '#' + City + '#' + St + '#' + Zip) X
)
If you don't want to use any analytic function (like row_number) then you could do something like this:
--This gives you the most recent date for an address
Select max(invoice_Date) as Invoice_date
Invoice_Tb.Customer_Address
, Invoice_Tb.City
, Invoice_Tb.Customer_State
, Invoice_Tb.ZIP_Code
into #tmp1
from Invoice_Tb
group by Invoice_Tb.Customer_Address
, Invoice_Tb.City
, Invoice_Tb.Customer_State
, Invoice_Tb.ZIP_Code
GO
--link back to the name for the most recent address
Select a.Invoice_date
,b.Customer_First_Name as [First]
,b.Customer_Last_Name as [Last]
, a.Invoice_Tb.Customer_Address as [Addr]
, a.Invoice_Tb.City
, a.Invoice_Tb.Customer_State as [State]
, a.Invoice_Tb.ZIP_Code as [Zip]
from #tmp1 a
left join Invoice_T b on
a.Invoicedate = b.Invoicedate
a.Customer_Address = b.Customer_Address
a.City = b.City
a.Customer_State = b.Customer_State
a.ZIP_Code = b.ZIP_Code
GO
Here's the "standard" way I always did this in my MsSql 2000 days. In your case, a subquery in the WHERE clause would also work, but I will use the INNER JOIN version of this solution. I employed a little bit of pseudo-code to cut down on my typing. You should be able to figure it out:
SELECT
t1.Invoice_Date, t1.Customer_First_Name,
t1.Customer_Last_Name,
t1.Customer_Address, t1.City,
t1.Customer_State, t1.ZIP_Code
FROM
Invoice_Tb t1
INNER JOIN Invoice_Tb t2
ON t1.Invoice_Number = (
SELECT TOP 1 t2.Invoice_Number
WHERE t2.Customer_Address=t1.Customer_Address
AND {t2.City,State,Zip=t1.City,State,Zip} --psuedocode
ORDER BY t2.Invoice_Date DESC
)
LEFT OUTER JOIN
Invoice_Detail_Tb ON ...
WHERE
...
ORDER BY
Invoice_Tb.Customer_Address
Also note that if any of the "address" fields can be NULL, you will have to handle that possibility when comparing them in the subquery.
you can do something like this query:
SELECT
i.last,
MAX(invoice_date) AS InvoiceDAte,
MAX(first) AS First,
MAX(addr) AS addr,
MAX(zip) AS zip
FROM invoice i
INNER JOIN (SELECT
last,
MAX(invoice_date) AS MaxInvoiceDate
FROM invoice
GROUP BY last) a
ON a.last = i.last
AND a.MaxInvoiceDate = i.Invoice_date
GROUP BY i.last
Here max(invoice_date) can be multiple in that case we require to take top 1 on that
I have a recursive query that I have working for the most part. Here is what I have so far:
DECLARE #table TABLE(mgrQID VARCHAR(64), QID VARCHAR(64), NTID VARCHAR(64), FullName VARCHAR(64), lvl int, dt DATETIME, countOfDirects INT)
WITH empList(mgrQID, QID, NTID, FullName, lvl, metadate)
AS
(
SELECT TOP 1 mgrQID, QID, NTID, FirstName+' '+LastName, 0, Meta_LogDate
FROM dbo.EmployeeTable_Historical
WHERE QID IN (SELECT director FROM dbo.attritionDirectors) AND Meta_LogDate <= #pit
ORDER BY Meta_LogDate DESC
UNION ALL
SELECT b.mgrQID, b.QID, b.NTID, b.FirstName+' '+b.LastName, lvl+1, b.Meta_LogDate
FROM empList a
CROSS APPLY dbo.Fetch_DirectsHistorical_by_qid(a.QID, #pit)b
)
INSERT INTO #table(mgrQID, QID, NTID, FullName, lvl, dt)
SELECT empList.mgrQID ,
empList.QID ,
empList.NTID ,
empList.FullName ,
empList.lvl ,
empList.metadate
FROM empList
ORDER BY lvl
OPTION(MAXRECURSION 10)
Now, #table has a list of QIDs in it. I need to then join my employee table and find out how many people report to each of those QID's.
So, there will need to be an UPDATE that happens to #table which provides the count of employees that report to each of those QID's.
Here is the catch.. The employee table is a historical table that can contain multiple records for the same people. Any time a piece of their information is updated a new record is created with those changes.
If I wanted to pull the most recent record for some one right now, i would use this:
SELECT TOP 1 E.*
FROM employeeTable_historical AS E
WHERE E.qid = A.[subQID]
AND CONVERT (DATE, GETDATE()) > CONVERT (DATE, E.[Meta_LogDate])
ORDER BY meta_logDate DESC
The question..
I need to be able to get the count of employees in the historical table that report directly to each QID in the #table. The historical table has a column called mgrQID. Is there a way I can get this count in the original recursive query?
I would recommend first that you look at the approach you're taking. The historical table you're dealing with will certainly need to select the greatest Meta_LogDate for any given employee, but in the structure you've set up here, you'll never select more than one record from matching attritionDirectors, thanks to the TOP 1 in your anchor query. As such, I'd recommend a lightweight function on which you base your query:
create function dbo.EmployeesAsOf(#date datetime)
returns table
as return
select mgrQID, QID, NTID, FirstName, LastName, Meta_LogDate
from dbo.EmployeeTable_Historical A
where Meta_LogDate = (select max(Meta_LogDate) from dbo.EmployeeTable_Historical B where A.QID = B.QID and Meta_LogDate <= #date)
This will allow you to get the most recent record for anyone, and as long as EmployeeTable_Historical has an index on (QID, Meta_LogDate), this view will perform well.
Having said that, looking at your recursive query, you'll likely want to tweak the recursive query somewhat:
create function empList(#thisDate datetime)
returns #emptbl table (
mgrQID varchar(10)
, QID varchar(10)
, NTID varchar(10)
, Name varchar(21)
, Meta_LogDate datetime
, DirectsThisMany int
)
as
begin
;with empList AS (
select E.mgrQID, E.QID, E.NTID, E.FirstName + ' ' + E.LastName AS Name, E.Meta_LogDate
from dbo.EmployeesAsOf(#thisDate) E
inner join dbo.attritionDirectors D on E.QID = D.QID
union all
select E.mgrQID, E.QID, E.NTID, E.FirstName + ' ' + E.LastName AS Name, E.Meta_LogDate
from dbo.EmployeesAsOf(#thisDate) E
inner join empList D on E.mgrQID = D.QID
)
insert into #emptbl
select A.mgrQID, A.QID, A.NTID, A.Name, A.Meta_LogDate, count(b.QID) AS DirectsThisMany
from empList A
left join empList B on A.QID = B.mgrQID
group by A.mgrQID, A.QID, A.NTID, A.Name, A.Meta_LogDate
return
end
In this way, you'll be able to feed in any date and get a read of the tables, including counts from history as of that date. The self-join of the CTE is what enables us to get the current count of directs, as one can't use aggregates in the CTE. This function is easy to use, and the indexing strategy should become apparent by looking at the query plan in SSMS. A simple SELECT * FROM EmpList(GETDATE()) will give the current situation.
For each POSTAL_CODE, I want to know how many NULL TIME_VISITEDs there are and how many NOT NULL TIME_VISITEDs
CREATE TABLE VISITS
(
ID INTEGER NOT NULL,
POSTAL_CODE VARCHAR(5) NOT NULL,
TIME_VISITED TIMESTAMP,
CONSTRAINT PK_VISITS PRIMARY KEY (ID)
);
Sample data:
INSERT INTO VISITS (ID, POSTAL_CODE, TIME_VISITED) VALUES ('234', '01910', '21.04.2014, 10:13:33.000');
INSERT INTO VISITS (ID, POSTAL_CODE, TIME_VISITED) VALUES ('334', '01910', '28.04.2014, 13:13:33.000');
INSERT INTO VISITS (ID, POSTAL_CODE, TIME_VISITED) VALUES ('433', '01910', '29.04.2014, 13:03:19.000');
INSERT INTO VISITS (ID, POSTAL_CODE, TIME_VISITED) VALUES ('533', '01910', NULL);
INSERT INTO VISITS (ID, POSTAL_CODE, TIME_VISITED) VALUES ('833', '01910', NULL);
This is the output I want for the data above:
POSTAL_CODE=01910, NUM_TIME_VISITED_NULL=2, NUM_TIME_VISITED_NOT_NULL=3
I am using the following SQL
SELECT distinct r.POSTAL_CODE,
(select count(*) from VISITS p where p.POSTAL_CODE=r.POSTAL_CODE and p.TIME_VISITED is null) as NUM_TIME_VISITED_NULL,
(select count(*) from VISITS p where p.POSTAL_CODE=r.POSTAL_CODE and p.TIME_VISITED is not null) as NUM_TIME_VISITED_NOT_NULL
FROM VISITS r
ORDER BY r.POSTAL_CODE
The query takes a very long time if there are lots of rows in the table
What changes do I need to make to be able to get this information more quickly?
Use conditional aggregation instead:
select v.postal_code,
sum(case when v.time_visited is null then 1 else 0
end) as NumTimeVisitedNull,
count(v.time_visited) as NumTimeVisitedNotNull
from visits v
group by v.postal_code;
Note: you can also write this as:
select v.postal_code,
(count(*) - count(v.time_visited) ) as NumTimeVisitedNull,
count(v.time_visited) as NumTimeVisitedNotNull
from visits v
group by v.postal_code;
The count() function specifically counts the number of non-NULL values.
You can do this all in one pass. COUNT counts how many non-NULLs there are. Then use SUM of a CASE statement to count up all the NULLs.
SELECT POSTAL_CODE
,COUNT(TIME_VISITED) AS NUM_TIME_VISITED_NOT_NULL
,SUM(CASE WHEN TIME_VISITED IS NULL THEN 1 ELSE 0 END)) AS NUM_TIME_VISITED_NULL
FROM VISITS
GROUP BY POSTAL_CODE