I have a problem like this: I need to optimize the application, with db (postgreSQL), the table looks like this:
CREATE TABLE voter_count(
id SERIAL,
name VARCHAR NOT NULL,
birthDate DATE NOT NULL,
count INT NOT NULL,
PRIMARY KEY(id),
UNIQUE (name, birthDate))
I have more than a thousand such voters, and I need to put all of them in the database, but among them there are several duplicates that could vote several times (from 2 to infinity), and I need to, when meeting such a duplicate,
increase the count field for an existing one (for a voter with the same name and birthdate). Previously, I just checked whether there is such a voter in the table or not, and if there is, then find it and increase the count.
But the program worked for too long, and I tried to do it through MULTI INSERT and use ON CONFLICT DO UPDATE to increase count,
but I get an error, then I asked a question on stackoverflow, and I was offered to do a lot of INSERTs, through a loop, but in PostgreSQL.
INSERT INTO voter_count(name, birthdate, count)
VALUES
('Ivan', '1998-08-05', 1),
('Sergey', '1998-08-29', 1),
('Ivan', '1998-08-05', 1)
ON CONFLICT (name, birthdate) DO UPDATE SET count = (voter_count.count + 1)
Question: how to do INSERT in a loop through PostgreSQL.
Probably the best option is to insert before all the data in a table without primary key, for instance:
CREATE TABLE voter_count_with_duplicates(
name VARCHAR NOT NULL,
birthDate DATE NOT NULL)
and then insert the data with a single statement:
INSERT INTO voter_count (name, birthDate, count)
SELECT name, birthDate, COUNT(*)
FROM voter_count_with_duplicates
GROUP BY name, birthDate
Note that if you have the data in a structured text file (for instance a CSV file), you can insert all the data into voter_count_with_duplicates with a single COPY statement.
If you have to insert (a lot of) new data with the table already populated, there are several possibilities. One is to use the solution in the comment. Another one is to perform an an update and an insert:
WITH present_tuples AS
(SELECT name, birthDate, COUNT(*) AS num_of_new_votes
FROM voter_count_with_duplicates d JOIN voter_count c ON
v.name = d.name and v.birthDate = d.birthDate
GROUP BY name, birthDate)
UPDATE voter_count SET count = count + num_of_new_votes
FROM present_tuples
WHERE present_tuples.name = voter_count.name
AND present_tuples.birthDate = voter_count.birthDate;
WITH new_tuples AS
(SELECT name, birthDate, COUNT(*) AS votes
FROM voter_count_with_duplicates d
WHERE NOT EXISTS SELECT *
FROM voter_count c
WHERE v.name = d.name and v.birthDate = d.birthDate
GROUP BY name, birthDate)
INSERT INTO voter_count (name, birthDate, count)
SELECT name, birthDate, votes
FROM new_tuples;
What you want to achieve is colloquially called an upsert; insert the row, if it doesn't exist, else update. The operation to use for this is MERGE.
The data set you want to merge into the existing table is the aggregate of your values grouped by name and bithdate with their total sum you want to insert/add.
MERGE INTO voter_count vc
USING
(
SELECT name, birthdate, SUM(cnt) as total
FROM
(
VALUES
('Ivan', DATE '1998-08-05', 1),
('Sergey', DATE '1998-08-29', 1),
('Ivan', DATE '1998-08-05', 1)
) input_data (name, birthdate, cnt)
GROUP BY name, birthdate
) data ON (data.name = vc.name and data.birthdate = vc.birthdate)
when not matched
insert (name, birthdate, count) values (data.name, data.birthdate, data.total)
when matched
update set count = count + data.total;
Related
Say I have two tables role and roleApp defined like this:
create table #tempRole(roleId int);
insert into #tempRole (roleId) values (1)
insert into #tempRole (roleId) values (2)
create table #tempRoleApp(roleId int, appId int);
insert into #tempRoleApp (roleId, appId) values (1, 26)
insert into #tempRoleApp (roleId, appId) values (2, 26)
insert into #tempRoleApp (roleId, appId) values (1, 27)
So, from #tempRoleApp table, I want to get only the rows that matches all the values of the #tempRole table (1 and 2), so in this case the output needs to be 26 (as it matches both 1 and 2) but not 27 as the table does not have 2, 27).
#tempRole table is actually the output from another query so it can have arbitrary number of values.
I tried few things like:
select *
from #tempRoleApp
where roleId = ALL(select roleId FROM #tempRole)
Which does not give anything... tried few more things but not getting what I want.
I believe this gives what you were looking for.
select tra.appId
from #tempRoleApp as tra
join #tempRole as tr on tra.roleId = tr.roleId
group by tra.appId
having count(distinct tra.roleId) = (select count(distinct roleId) from #tempRole)
It uses count distinct to get the total unique roleId's in the tempRole table and compares that with the unique count of these per appId, after confirming the roleIds match between the tables.
As you clarified in the comment, once you add another tempRole roleId, now no entry has all of the Ids so no rows are returned.
What I'm trying to do is to filter by the clients that registered twice in the DB. This as I need to know who of them came at least twice, that is why I´m working with a table that registers every time they registered in the system as it follows:
order #
client
date
One
Andrew
XX
Two
Andrew
XX+1
Three
Andrew
XX+2
One
David
YY
One
Marc
ZZ
Two
Marc
ZZ+1
In this case I want to delete David´s record, as I only want people who has order numbers distinct than "one".
I tried this SQL:
select *
from table
where order_number > 1
however what this does is remove all the rows of the first orders, including the ones that came back.
Does somebody know an easy way for me to compare row names and filter by that or just how could I delete those rows in which there are clients with only one entry?
you need something like this :
select * from yourtable
where not exists (select 1 from yourtable where order_number >1)
or:
select client
from tablename
group by client
having count(*) > 1
CREATE TABLE records (
ID INTEGER PRIMARY KEY,
order_number TEXT NOT NULL,
client TEXT NOT NULL,
date DateTime NOT NULL
);
INSERT INTO records VALUES (1,'ONE', 'Adrew', '01.01.1999');
INSERT INTO records VALUES (2, 'TWO','Adrew', '02.02.1999');
INSERT INTO records VALUES (3, 'THREE','Adrew', '03.03.1999');
INSERT INTO records VALUES (4, 'ONE', 'David', '01.01.1999');
INSERT INTO records VALUES (5, 'ONE','Marc', '01.01.1999');
INSERT INTO records VALUES (6, 'TWO','Marc', '01.03.1999');
DELETE FROM records WHERE ID in
(
SELECT COUNT(client) as numberofclient FROM records
Group By client Having Count (client) > 1
);
For my task i need to add a main entry and then add some additional, using the id of the added record.
Now i use request with "returning id":
with rows as (
insert into "Contact"(name, gender, city, birthdate)
values
('Name', 1, 'City', '2000-02-03')
returning id
)
insert into "Education"(user_id, place, degree, endyear)
select id , 'some_place', 'some_state', 1990 from rows
This way I can add one additional entry, but I need several. If I try to do the second insert query - postgre loses relation "rows"
with rows as (
insert into "Contact"(name, gender, city, birthdate)
values
('Name', 1, 'City', '2000-02-03')
returning id
)
insert into "Education"(user_id, place, degree, endyear)
select id , 'some_place', 'some_state', 1990 from rows
insert into "Status"(user_id, status)
select id , 'val' from rows
ERROR: relation "rows" does not exist
LINE 11: select id , 'val' from rows
^
SQL state: 42P01
Character: 373
is there any way to fix this?
I would suggest to put all statements in an anonymous code block. You could use the RETURNING into a variable and from there execute the following inserts, e.g.
DO $$
DECLARE returned_id int;
BEGIN
INSERT INTO contact
(name, gender, city, birthdate) VALUES
('Name', 1, 'City', '2000-02-03')
RETURNING id INTO returned_id;
INSERT INTO education
(user_id, place, degree, endyear) VALUES
(returned_id, 'some_place', 'some_state', '1990-01-01');
INSERT INTO status
(user_id, status) VALUES
(returned_id, 'val');
END;
$$;
Demo: db<>fiddle
For each POSTAL_CODE, I want to know how many NULL TIME_VISITEDs there are and how many NOT NULL TIME_VISITEDs
CREATE TABLE VISITS
(
ID INTEGER NOT NULL,
POSTAL_CODE VARCHAR(5) NOT NULL,
TIME_VISITED TIMESTAMP,
CONSTRAINT PK_VISITS PRIMARY KEY (ID)
);
Sample data:
INSERT INTO VISITS (ID, POSTAL_CODE, TIME_VISITED) VALUES ('234', '01910', '21.04.2014, 10:13:33.000');
INSERT INTO VISITS (ID, POSTAL_CODE, TIME_VISITED) VALUES ('334', '01910', '28.04.2014, 13:13:33.000');
INSERT INTO VISITS (ID, POSTAL_CODE, TIME_VISITED) VALUES ('433', '01910', '29.04.2014, 13:03:19.000');
INSERT INTO VISITS (ID, POSTAL_CODE, TIME_VISITED) VALUES ('533', '01910', NULL);
INSERT INTO VISITS (ID, POSTAL_CODE, TIME_VISITED) VALUES ('833', '01910', NULL);
This is the output I want for the data above:
POSTAL_CODE=01910, NUM_TIME_VISITED_NULL=2, NUM_TIME_VISITED_NOT_NULL=3
I am using the following SQL
SELECT distinct r.POSTAL_CODE,
(select count(*) from VISITS p where p.POSTAL_CODE=r.POSTAL_CODE and p.TIME_VISITED is null) as NUM_TIME_VISITED_NULL,
(select count(*) from VISITS p where p.POSTAL_CODE=r.POSTAL_CODE and p.TIME_VISITED is not null) as NUM_TIME_VISITED_NOT_NULL
FROM VISITS r
ORDER BY r.POSTAL_CODE
The query takes a very long time if there are lots of rows in the table
What changes do I need to make to be able to get this information more quickly?
Use conditional aggregation instead:
select v.postal_code,
sum(case when v.time_visited is null then 1 else 0
end) as NumTimeVisitedNull,
count(v.time_visited) as NumTimeVisitedNotNull
from visits v
group by v.postal_code;
Note: you can also write this as:
select v.postal_code,
(count(*) - count(v.time_visited) ) as NumTimeVisitedNull,
count(v.time_visited) as NumTimeVisitedNotNull
from visits v
group by v.postal_code;
The count() function specifically counts the number of non-NULL values.
You can do this all in one pass. COUNT counts how many non-NULLs there are. Then use SUM of a CASE statement to count up all the NULLs.
SELECT POSTAL_CODE
,COUNT(TIME_VISITED) AS NUM_TIME_VISITED_NOT_NULL
,SUM(CASE WHEN TIME_VISITED IS NULL THEN 1 ELSE 0 END)) AS NUM_TIME_VISITED_NULL
FROM VISITS
GROUP BY POSTAL_CODE
I have one table with gender as one of the columns.
In gender column only M or F are allowed.
Now i want to sort the table so that while displaying the table in gender field M and F will come alternetivly.
I have Tried....
I have tried to create one(new) table with the same structure as my existing table.
Now using high leval insert i want to insert M to odd rows and F to even rows.
After that i want to join those two statements using union operator.
I am able to insert to ( new ) the table only male or female but not to the even or odd rows...
Can any body help me regarding this....
Thanks in Advance....
Don't consider a table to be "sorted". The SQL server may return the rows in any order depending on execution plan, index, joins etc. If you want a strict order you need to have an ordered column, like an identity column. Usually it is better to apply the desired sorting when selecting data.
However the interleaving of M and F is a little bit tricky, you need to use the ROW_NUMBER function.
Valid SQL Server code:
CREATE TABLE #GenderTable(
[Name] [nchar](10) NOT NULL,
[Gender] [char](1) NOT NULL
)
-- Create sample data
insert into #GenderTable (Name, Gender) values
('Adam', 'M'),
('Ben', 'M'),
('Casesar', 'M'),
('Alice', 'F'),
('Beatrice', 'F'),
('Cecilia', 'F')
SELECT * FROM #GenderTable
SELECT * FROM #GenderTable
order by ROW_NUMBER() over (partition by gender order by name), Gender
DROP TABLE #GenderTable
This gives the output
Name Gender
Adam M
Ben M
Casesar M
Alice F
Beatrice F
Cecilia F
and
Name Gender
Alice F
Adam M
Beatrice F
Ben M
Cecilia F
Casesar M
If you use another DBMS the syntax may differ.
I think the best way to do it would be to have two queries (one for M, one for F) and then join them together. The catch would be you would have to calculate the "rank" of each query and then sort accordingly.
Something like the following should do what you need:
select * from
(select
#rownum:=#rownum+1 rank,
t.*
from people_table t,
(SELECT #rownum:=0) r
where t.gender = 'M'
union
select
#rownum:=#rownum+1 rank,
t.*
from people_table t,
(SELECT #rownum:=0) r
where t.gender = 'F') joined
order by joined.rank, joined.gender;
If you are using SQL Server, you can seed your two tables with an IDENTITY column as follows. Make one odd and one even and then union and sort by this column.
Note that you can only truly alternate if there are the same number of male and female records. If there are more of one than the other, you will end up with non-alternating rows at the end.
CREATE TABLE MaleTable(Id INT IDENTITY(1,2) NOT NULL, Gender CHAR(1) NOT NULL)
INSERT INTO MaleTable(Gender) SELECT 'M'
INSERT INTO MaleTable(Gender) SELECT 'M'
INSERT INTO MaleTable(Gender) SELECT 'M'
CREATE TABLE FemaleTable(Id INT IDENTITY(2,2) NOT NULL, Gender CHAR(1) NOT NULL)
INSERT INTO FemaleTable(Gender) SELECT 'F'
INSERT INTO FemaleTable(Gender) SELECT 'F'
INSERT INTO FemaleTable(Gender) SELECT 'F'
SELECT u.Id
,u.Gender
FROM (
SELECT Id, Gender
FROM FemaleTable
UNION
SELECT Id, Gender
FROM MaleTable
) u
ORDER BY u.Id ASC
See here for a working example