Count Duplicate Values SQL Server 2014 - sql

Say I have the below dataset
ID| Last | FavColor
-------------------------
1 | Johnson | BLUE
1 | Johnson | RED
2 | Thomas | YELLOW
3 | Anderson| BLUE
3 | Anderson| RED
3 | Anderson| BLUE
4 | Phillips| ORANGE
4 | Phillips| ORANGE
How do I create a query that still keeps the ID, Last and FavColor, but shows a occurrence count of each color?
ID| Last | FavColor | Color Count |
-----------------------------------------
1 | Johnson | BLUE | 1 |
1 | Johnson | RED | 1 |
2 | Thomas | YELLOW | 1 |
3 | Anderson| BLUE | 2 |
3 | Anderson| RED | 1 |
3 | Anderson| BLUE | 2 |
4 | Phillips| ORANGE | 2 |
4 | Phillips| ORANGE | 2 |
I attempted to do a COUNT(FavColor) PARTITION BY(ID), but am not sure how to count duplicates.

You want to partition by id as well as favcolor:
SELECT ID,
Last ,
FavColor,
COUNT(*) over (partition BY id, favcolor) color_count
FROM your_table t;

Related

Percentage to total in BigQuery Legacy SQL (Subqueries?)

I can't understand how to calulate percentage to total in BigQuery Legacy SQL.
So, I have a table:
ID | Name | Group | Mark
1 | John | A | 10
2 | Lucy | A | 5
3 | Jane | A | 7
4 | Lily | B | 9
5 | Steve | B | 14
6 | Rita | B | 11
I want to calculate percentage like this:
ID | Name | Group | Mark | Percent
1 | John | A | 10 | 10/(10+5+7)=45%
2 | Lucy | A | 5 | 5/(10+5+7)=22%
3 | Jane | A | 7 | 7/(10+5+7)=33%
4 | Lily | B | 9 | 9/(9+14+11)=26%
5 | Steve | B | 14 | 14/(9+14+11)=42%
6 | Rita | B | 11 | 11/(9+14+11)=32%
My table is quite long for me (3 million rows).
I thought that I could do it with subqueries, but in SELECT I can't use subqueries.
Does anyone know a way to do it?
SELECT
ID, Name, [Group], Mark,
RATIO_TO_REPORT(Mark) OVER(PARTITION BY [Group]) AS percent
FROM YourTable
Check more about RATIO_TO_REPORT

Set multiple column values from previous row using Postgres SQL

Assume you have a series of responses from people on what their favorite colors are. This information is stored in a SQL table:
| id | favorite_color | friend_recommendation_id |
|----|----------------|--------------------------|
| 1 | green | |
| 2 | blue | |
| 3 | yellow | |
| 4 | green | |
| 5 | yellow | |
| 6 | green | |
My goal is to write a Postgres SQL query that would fill the friend_recommendation column with the id of the most recent person to respond with the same color as the provided individual. This would result in the following table:
| id | favorite_color | friend_recommendation_id |
|----|----------------|--------------------------|
| 1 | green | |
| 2 | blue | |
| 3 | yellow | |
| 4 | green | 1 |
| 5 | yellow | 3 |
| 6 | green | 4 |
Note that id 6 is filled with 4 and not 1
I've tried using variables and subselects, but am struggling with how to apply the select for each result from the parent query.
Use a subquery to calculate the field
SQL Fiddle Demo
SELECT "id", "favorite_color", (SELECT MAX("id")
FROM colors c2
WHERE c2."favorite_color" = c1."favorite_color"
AND c2."id" < c1."id"
) as friend_recommendation_id
FROM colors c1
OUTPUT
| id | favorite_color | friend_recommendation_id |
|----|----------------|--------------------------|
| 1 | green | (null) |
| 2 | blue | (null) |
| 3 | yellow | (null) |
| 4 | green | 1 |
| 5 | yellow | 3 |
| 6 | green | 4 |
Can also be write like this:
SELECT c1."id", c1."favorite_color", MAX(c2."id") as friend_recommendation_id
FROM colors c1
LEFT JOIN colors c2
ON c2."favorite_color" = c1."favorite_color"
AND c2."id" < c1."id"
GROUP BY c1."id", c1."favorite_color"
ORDER BY c1."id";
UPDATE
UPDATE colors target
SET "friend_recomendation_id" = ( SELECT MAX("id")
FROM colors c2
WHERE c2."favorite_color" = target."favorite_color"
AND c2."id" < target."id")

MODE in Teradata SQL - excluding a value from the range, and using multiple tables

I am comparing 3 tables of data for what should be the same demands, and want to create a table that shows the MODE from two of the tables (in order to make a suggestion of what the correct value could be). I may need to concatenate as I am looking for the mode of all rows with the same ID and Name.
Table 1:
+----------+----------+------+
| DemandNo | Forename | Size |
+----------+----------+------+
| 1 | Richard | 42 |
| 2 | Richard | 42 |
| 3 | Richard | 42 |
| 4 | Richard | 36 |
| 5 | Richard | 36 |
| 6 | Richard | 36 |
| 7 | Richard | 36 |
| 8 | Luke | 14 |
| 9 | Luke | 14 |
| 10 | Luke | 14 |
| 11 | Luke | 14 |
| 12 | Luke | 14 |
| 13 | Luke | 25 |
| 14 | Luke | 25 |
| 15 | Luke | 25 |
+----------+----------+------+
Table 2:
+----------------+-----------------+
| List1_DemandNo | List1_PenColour |
+----------------+-----------------+
| 1 | White |
| 2 | Black |
| 3 | Black |
| 4 | Red |
| 5 | ? |
| 6 | Red |
| 7 | Red |
| 8 | Yellow |
| 9 | Yellow |
| 10 | Yellow |
| 11 | Green |
| 12 | Yellow |
| 13 | Green |
| 14 | ? |
| 15 | ? |
+----------------+-----------------+
Table 3:
+----------------+-----------------+
| List2_DemandNo | List2_PenColour |
+----------------+-----------------+
| 1 | White |
| 2 | Green |
| 3 | Green |
| 4 | Red |
| 5 | ? |
| 6 | Red |
| 7 | Red |
| 8 | Pink |
| 9 | Pink |
| 10 | Yellow |
| 11 | Green |
| 12 | Pink |
| 13 | Orange |
| 14 | Orange |
| 15 | Orange |
+----------------+-----------------+
So I need to generate a recommendation for each person with the same name and size. The recommendation should be the MODE of all rows in table1 where the person has the same Forename and Size (do I need to concatenate Forename and size?)
The other requirement is that all question marks "?" should be excluded from the MODE/recommendation.
So my results table should look something like this:
+----------+----------+-----+------------+------------+
| DemandNo | Forename | Size | List1_MODE | List2_MODE |
+----------+----------+-----+------------+------------+
| 1 | Richard | 42 | Black | Green |
| 2 | Richard | 42 | Black | Green |
| 3 | Richard | 42 | Black | Green |
| 4 | Richard | 36 | Red | Red |
| 5 | Richard | 36 | Red | Red |
| 6 | Richard | 36 | Red | Red |
| 7 | Richard | 36 | Red | Red |
| 8 | Luke | 14 | Yellow | Pink |
| 9 | Luke | 14 | Yellow | Pink |
| 10 | Luke | 14 | Yellow | Pink |
| 11 | Luke | 14 | Yellow | Pink |
| 12 | Luke | 14 | Yellow | Pink |
| 13 | Luke | 25 | Green | Orange |
| 14 | Luke | 25 | Green | Orange |
| 15 | Luke | 25 | Green | Orange |
+----------+----------+-----+------------+------------+
I understand that the MODE function does not work in Teradata, and that I would need to perform a count, but the complexity of calculating the mode using 3 tables and excluding the ? is sadly beyond my SQL skills -Any help would be truly appreciated!
Many thanks
Richard
You need to write a separate select for each mode and join to it:
select t1.*, List1_PenColour, List2_PenColour
from table1 as t1
left join
(
select Forename, Size, List1_PenColour
from table1 as t1
join table2 as t2
on t1.DemandNo = t2.List1_DemandNo
and List1_PenColour <> '?'
group by Forename, Size, List1_PenColour
-- return only the row with the highest count
-- random row if multiple rows with the same count exist
qualify row_number()
over (partiton by Forename, Size, List1_PenColour
order by count(*) desc) = 1
) as list1
on t1.Forename = list1.Forename
and t1.Size = list1.Size
left join
(
select Forename, Size, List2_PenColour
from table1 as t1
join table3 as t3
on t1.DemandNo = t3.List2_DemandNo
and List2_PenColour <> '?'
group by Forename, Size, List2_PenColour
qualify row_number()
over (partiton by Forename, Size, List2_PenColour
order by count(*) desc) = 1
) as list2
on t1.Forename = list2.Forename
and t1.Size = list2.Size

Query to subtract all numbers in a column

I don't think code should be necessary here but let me know if you'd like it anyways.
I had to delete an entire set of entries in one of my tables. These entries were being organized by a integer value that increased with intervals. Is there a way that I can write a query so that all the values in a particular column will update with a -1 value?
So for example, lets say I had this table
| Red | 1 |
| Orange | 2 |
| Yellow | 3 |
| Green | 4 |
| Cyan | 6 |
| Blue | 7 |
| Purple | 8 |
| Violet | 9 |
could I write a single query so that cyan - violet's numbers all subtracted by one rather than doing a unique update for every entry?
| Red | 1 |
| Orange | 2 |
| Yellow | 3 |
| Green | 4 |
| Cyan | 5 |
| Blue | 6 |
| Purple | 7 |
| Violet | 8 |
Use Cte to update and Row_number() to generate sequential numbers
; WITH cte
AS (SELECT ( Row_number()OVER( ORDER BY id) )rn,*
FROM yourtable)
UPDATE cte
SET id = rn

How do I select the results that were inserted?

I have two tables in a SQLite database.
First table:
id | name | number
1 | Paul | 1
2 | John | 2
3 | Jessica | 3
Second table:
id | name | number
1 | Aaron | 1
2 | Barbara | 2
3 | Erik | 3
I do a JOIN LEFT and I insert the result in the second table, so:
First table:
id | name | number
1 | Paul | 1
2 | John | 2
3 | Jessica | 3
Second table:
id | name | number
1 | Aaron | 1
2 | Barbara | 2
3 | Erik | 3
4 | Paul | 1
5 | John | 2
6 | Jessica | 3
After, in other sql statement, I select the results which were inserted in the second table ( 4 | Paul | 1; 5 | John | 2 and 6 | Jessica | 3).
Can i do this in one sql statement to get a better performance?
Thanks