Counting rows for a particular name - sql

I've a table temp(name int,count int). It stores:-
a|count
1|10
1|8
1|4
1|2
2|10
2|6
2|1
I want it's rows to be numbered, corresponding to a given name(also, note that count has to be in decreasing order), i.e, :-
a|count|row
1|10 |1
1|8 |2
1|4 |3
1|2 |4
2|10 |1
2|6 |2
2|1 |3
I tried How to show row numbers in PostgreSQL query? this post, but it just seems to number it from 1 to 7 and not name-wise. Can someone please help me with this? Thanks!

Use row_number() function
select a, count, row_number() over(partition by a order by count desc) as rn
from tablename

Related

Looking to find duplicates using DIFFERENCE() among 2+ columns

I'm trying to write a SQL Select query that uses the DIFFERENCE() function to find similar names in a database to identify duplicates.
The short version of the code I'm using is:
SELECT *, DIFFERENCE(FirstName, LEAD(FirstName) OVER (ORDER BY SOUNDEX(FirstName))) d
WHERE d >= 3
The problem is my database has additional columns that include middle names and nicknames. So if I have a customer who has multiple names they go by, they might be in the database multiple times, and I need to compare a variety of columns against each other.
Sample Data:
+----+--------+--------+--------+--------+
|ID |First |Middle |AKA1 |AKA2 |
+----+--------+--------+--------+--------+
|1 |Sally |Ann |NULL |NULL |
|2 |Ann |NULL |NULL |NULL |
|3 |Sue |NULL |NULL |NULL |
|4 |Suzy |NULL |NULL |NULL |
|5 |Patricia|NULL |Trish |Patty |
|6 |Patty |NULL |Patricia|Trish |
|7 |Trish |NULL |Patty |Patricia|
+----+--------+--------+--------+--------+
In the above, rows 1+2 are duplicates of each other, as are 3+4, and 5+6+7.
So I'm not sure the best way to get what I want. Here's the longer version of the code I'm actually using:
WITH A AS (SELECT *,
SOUNDEX(FirstName) AS "FirstSoundex",
SOUNDEX(LastName) AS "LastSoundex",
LAG (SOUNDEX(FirstName)) OVER (ORDER BY SOUNDEX(FirstName)) AS "PreviousFirstSoundex",
LAG (SOUNDEX(LastName)) OVER (ORDER BY SOUNDEX(LastName)) AS "PreviousLastSoundex"
FROM Clients),
B AS (
SELECT *,
ISNULL(DIFFERENCE(FirstName, LEAD(FirstName) OVER (ORDER BY FirstSoundex)),0) AS "FirstScore",
ISNULL(DIFFERENCE(LastName, LEAD(LastName) OVER (ORDER BY LastSoundex)),0) AS "LastScore"
FROM A),
C AS (
SELECT *,
ISNULL(LAG (FirstScore) OVER (ORDER BY FirstSoundex),0) AS "PreviousFirstScore",
ISNULL(LAG (LastScore) OVER (ORDER BY LastSoundex),0) AS "PreviousLastScore"
FROM B
),
D AS (
SELECT *,
(CASE WHEN (PreviousFirstScore >=3 AND PreviousLastScore >=3) THEN (PreviousFirstSoundex + PreviousLastSoundex)
WHEN (FirstScore >= 3 AND LastScore >=3) THEN (FirstSoundex + LastSoundex)
END) AS "GroupName"
FROM C
WHERE ((PreviousFirstScore >=3 AND PreviousLastScore >=3) OR (FirstScore >= 3 AND LastScore >=3))
)
SELECT *,
LAG(GroupName) OVER (ORDER BY GroupName) AS "PreviousGroup",
LEAD(GroupName) OVER (ORDER BY GroupName) AS "NextGroup"
FROM D
WHERE (D.GroupName = D.PreviousGroup OR D.GroupName = D.NextGroup)
This lets me group together bundles of potential duplicates and it works well for me. However, I now want to add in a way to check against multiple columns, and I don't know how to do that.
I was thinking about creating a union, something like:
SELECT ClientID,
LastName,
FirstName AS "TempName"
FROM Clients
UNION
SELECT ClientID,
LastName,
MiddleName AS "TempName"
FROM Clients
WHERE MiddleName IS NOT NULL
...etc
But then my LAG() and LEAD() wouldn't work because I'd have multiple rows with the same ClientID. I don't want to identify a single Client as a duplicate of itself.
Anyways, any suggestions? Thanks in advance.

Select rows that are duplicates on two columns

I have data in a table. There are 3 columns (ID, Interval, ContactInfo). This table lists all phone contacts. I'm attempting to get a count of phone numbers that called twice on the same day and have no idea how to go about this. I can get duplicate entries for the same number but it does not match on date. The code I have so far is below.
SELECT ContactInfo, COUNT(Interval) AS NumCalls
FROM AllCalls
GROUP BY ContactInfo
HAVING COUNT(AllCalls.ContactInfo) > 1
I'd like to have it return the date, the number of calls on that date if more than 1, and the phone number.
Sample data:
|ID |Interval |ContactInfo|
|--------|------------|-----------|
|1 |3/1/2017 |8009999999 |
|2 |3/1/2017 |8009999999 |
|3 |3/2/2017 |8001234567 |
|4 |3/2/2017 |8009999999 |
|5 |3/3/2017 |8007771111 |
|6 |3/3/2017 |8007771111 |
|--------|------------|-----------|
Expected result:
|Interval |ContactInfo|NumCalls|
|------------|-----------|--------|
|3/1/2017 |8009999999 |2 |
|3/3/2017 |8007771111 |2 |
|------------|-----------|--------|
Just as juergen d suggested, you should try to add Interval in your GROUP BY. Like so:
SELECT AC.ContactInfo
, AC.Interval
, COUNT(*) AS qnty
FROM AllCalls AS AC
GROUP BY AC.ContactInfo
, AC.Interval
HAVING COUNT(*) > 1
The code should like this :
select Interval , ContactInfo, count(ID) AS NumCalls from AllCalls group by Interval, ContactInfo having count(ID)>1;

vertica sql delta

I want to calculate delta value between 2 records my table got 2 column id and timestamp i want to calculate the delta time between the records
id |timestamp |delta
----------------------------------
1 |100 |0
2 |101 |1 (101-100)
3 |106 |5 (106-101)
4 |107 |1 (107-106)
I work with a Vertica data base and I want to create view/projection of this table on my DB.
Is it possible to create this calculate without using udf function?
You can use lag() for this purpose:
select id, timestamp,
coalesce(timestamp - lag(timestamp) over (order by id), 0) as delta
from t;

Fetch data from multiple tables in postgresql

I am working on an application where I want to fetch the records from multiple tables which are connected through foreign key. The query I am using is
select ue.institute, ue.marks, uf.relation, uf.name
from user_education ue, user_family uf where ue.user_id=12 and uf.user_id=12
The result of the query is
You can see the data is repeating in it. I only want a record one time. I want no repetition. I want something like this
T1 T2
id|name|fid id|descrip| fid
1 |A |1 1|DA | 1
2 |B |1 2|DB | 1
2 |B |1
Result which I want:
Result:
id|name|fid|id|descrip| fid
1 |A |1 |1|DA | 1
2 |B |1 |2|DB | 1
2 |B |1 |
The results fetched through your query
The total rows are 5
More Information
I want the rows of same user_id from both tables but you can see in T1 there are 3 rows and in T2 there are 2 rows. I do not want repetitions but also I want to fetch all the data on the basis of user_id
Table Schemas,s
T1
T2
I can't see why you would want that, but the solution could be to use the window function row_number():
SELECT ue.institute, ue.marks, uf.relation, uf.name
FROM (SELECT institute, marks, row_number() OVER ()
FROM user_education
WHERE user_id=12) ue
FULL OUTER JOIN
(SELECT relation, name, row_number() OVER ()
FROM user_family
WHERE user_id=12) uf
USING (row_number);
The result would be pretty meaningless though, as there is no ordering defined in the individual result sets.

sum rows from one table and move it to another table

How can I sum rows from one table (based on selected critiria) and move the outcome to another table.
I have a table related to costs within project:
Table "costs":
id| CostName |ID_CostCategory| PlanValue|DoneValue
-------------------------------------------------------
1 | books |1 |100 |120
2 | flowers |1 |90 |90
3 | car |2 |150 |130
4 | gas |2 |50 |45
and I want to put the sum of "DoneValue" of each ID_CostCategory into table "CostCategories"
Table "CostCategories":
id|name |planned|done
------------------------
1 |other|190 |takes the sum from above table
2 |car |200 |takes the sum from above table
Many thanks
I would not store this, because as soon as anything changes in Costs then CostCategories will be out of date, instead I would create a view e.g:
CREATE VIEW CostCategoriesSum
AS
SELECT CostCategories.ID,
CostCategories.Name,
SUM(COALESCE(Costs.PlanValue, 0)) AS Planned,
SUM(COALESCE(Costs.DoneValue, 0)) AS Done
FROM CostCategories
LEFT JOIN Costs
ON Costs.ID_CostCategory = CostCategories.ID
GROUP BY CostCategories.ID, CostCategories.Name;
Now instead of referring to the table, you can refer to the view and the Planned and Done totals will always be up to date.
INSERT INTO CostCategories(id,name,planned,done)
SELECT ID_CostCategory, CostName, SUM(PlanValue), SUM(DoneValue)
FROM costs GROUP BY ID_CostCategory, CostName