Duplicate value postgresql

Duplicate value postgresql - sql

I have an entry in the database
| group | account | description | balance | balance1 |
+----------+-------------+-----------------+-------------+--------------+
| 123123 | 0 | Name 1 | 1000.00 | 0 |
| 123123 | 777 | Name 2 | 250.00 | 0 |
| 123123 | 999 | Name 3 | 0 | 350.00 |
| 123000 | 0 | Name 4 | 500.00 | 0 |
| 123000 | 567 | Name 5 | 0 | 500.00 |
select
select * from table;
Gives exactly the same result as the example above.
I would like to get the result without duplicates in the "group" column. Here's one:
| group | account | description | balance | balance1 |
+----------+-------------+-----------------+-------------+--------------+
| 123123 | 0 | Name 1 | 1000.00 | 0 |
| | 777 | Name 2 | 250.00 | 0 |
| | 999 | Name 3 | 0 | 350.00 |
| 123000 | 0 | Name 4 | 500.00 | 0 |
| | 567 | Name 5 | 0 | 500.00 |
That is, as you can see from the example, I want to remove only duplicate values from the first column, without affecting the rest.
Also "group by", "order by" I can't use, as it will break the sequence of information output.

Something like this might work for you:
with cte as
(
SELECT goup, account, description, balance, balance1,
row_number() OVER(ORDER BY (SELECT NULL)) as rn
FROM yourtable
)
SELECT case when LAG(goup) OVER (ORDER BY rn) = goup THEN NULL ELSE goup END AS goup,
account, description, balance, balance1
FROM cte;
ORDER BY (SELECT NULL) is a fairly horrible hack. It is there because row_number() requires an ORDER BY but you specifically stated that you can't use an order by. The row_number() is however needed in order to use LAG, which itself requires an OVER (ORDER BY..).
Very much a case of caveat emptor, but it might give you what you are looking for.

Related

Count Distinct Over Multiple Columns

I have two CTEs . The following is the output of my first CTE.
| ORDER_NUMBER | ORDER_FLAG | EMPLOYEE | PRODUCT_CATEGORY | SALES |
|--------------|------------|----------|------------------|--------|
| 3158132 | 1 | Don | Newpaper Ad | 16.00 |
| 3158132 | 1 | Don | Magazine Ad | 15.00 |
| 3158132 | 0 | Don | TV Ad | 0.00 |
| 3158132 | 1 | Don | Billboard Ad | 56.00 |
| 3006152 | 1 | Roger | TV Ad | 20.00 |
| 3006152 | 0 | Roger | Magazine Ad | 0.00 |
| 3006152 | 1 | Roger | Newspaper Ad | 214.00 |
| 3012681 | 1 | Ken | TV Ad | 130.00 |
| 3012681 | 0 | Ken | Magazine Ad | 0.00 |
| 9818123 | 1 | Pete | Billboard Ad | 200.00 |
I'm attempting to count the distinct order numbers and the sales amount by employee. The order flag will be either 1 or a 0. If sales are greater than 0.00 the order flag will be set to 1.
My desired output.
| Employee | Sales | Orders |
|----------|--------|--------|
| Don | 87.00 | 1 |
| Ken | 130.00 | 1 |
| Pete | 200.00 | 1 |
| Roger | 234.00 | 1 |
I was attempting to do a combination of distinct, case, and concat statements without any luck. Any thoughts?

You can use this:
with cteTotalSales (...) as ()
select employee,
case when (sum(sales)) > 0
then 1 else 0 as Orders,
sum(sales)
from cteTotalSales
group by employee

This should be as simple as :
with cte as (...)
select
employee,
sum(sales),
count(distinct order_number)
from cte
group by employee

This query would work for you
SELECT
EMPLOYEE,
SUM(SALES) SALES,
1 AS ORDERS
FROM
YOUR_TABLE
GROUP BY
EMPLOYEE
you can replace your subquery with YOUR_TABLE.
SELECT
EMPLOYEE,
SUM(SALES) SALES,
1 AS ORDERS
FROM
(
SELECT * FROM ...
)
GROUP BY
EMPLOYEE

SQL Multiple count columns with multiple conditionS

I am trying to gather some basic statistics from a table "Data_Table" that gets updated on a daily basis. Each row represents a case, which can be opened/closed/cancelled by an operator with a unique ID. I want to be able to show the count for the actions that each operator did the previous day. So getting from Data_Table to Ideal table.
Data_Table
| LOCATION | DATE | REFERENCE | OPENED_ID | CLOSED_ID | CANCELLED_ID |
| NYC | 20180102 | 123451 | 123 | 234 | 0 |
| TEX | 20180102 | 123452 | 345 | 123 | 0 |
| NYC | 20180102 | 123453 | 345 | 0 | 123 |
| TEX | 20180102 | 123453 | 234 | 0 | 123 |
Ideal Table
| LOCATION | DATE | USER_ID | OPEN | CLOSED | CANCELLED |
| NYC | 20180102 | 123 | 1 | 0 | 1 |
| NYC | 20180102 | 234 | 0 | 1 | 0 |
| NYC | 20180102 | 345 | 1 | 0 | 0 |
| TEX | 20180102 | 123 | 0 | 1 | 1 |
| TEX | 20180102 | 234 | 1 | 0 | 0 |
| TEX | 20180102 | 345 | 1 | 0 | 0 |
User 123 opened 1 case and cancelled 1 case in location NYC on date 20180102...etc.
I have made a few small queries for each action in each site that looks like this:
SELECT LOCATION, DATE, OPENED_ID, COUNT(DISTINCT [DATA_TABLE].REFERENCE)
FROM [DATA_TABLE]
WHERE DATE = CONVERT(DATE,GETDATE()-1)
AND LOCATION = 'NYC'
AND OPENED_ID in (SELECT NYC FROM [OP_ID_TABLE]WHERE [DATE FINISH] > GETDATE() )
GROUP BY OPENED_ID, LOCATION, DATE
ORDER BY LOCATION
And then repeat this query for each location for each operator action. After which I do some messy vlookups in excel to organise it into the Ideal table format, which on a daily basis is ..not ideal.
I've tried to make some sum functions but haven't had any luck.
Any help would be much appreciated.

You need to unpivot and re-aggregate. One method uses union all and group by:
select location, date, user_id,
sum(opened) as opens, sum(closed) as closes, sum(cancelled) as cancels
from ((select location, date, opened_id as user_id, 1 as opened, 0 as closed, 0 as cancelled
from t
) union all
(select location, date, closed_id as user_id, 0 as opened, 1 as closed, 0 as cancelled
from t
) union all
(select location, date, cancelled_id as user_id, 0 as opened, 0 as closed, 1 as cancelled
from t
)
) t
group by location, date, user_id;
There are other methods for doing these operations, depending on the database. However, this is ANSI-standard syntax.

Having Groups based on distinct count of another column

I have a table as follow :
+-------------+-----------+------+
| GroupNumber | TeamName | Goal |
+-------------+-----------+------+
| 1 | Sales | ABC |
| 1 | Sales | ABC |
| 1 | Sales | ABC |
| 1 | Design | XYZ |
| 2 | Design | XYZ |
| 2 | Sales | XYZ |
| 2 | technical | XYZ |
| 2 | Support | XYZ |
| 3 | Sales | XYZ |
| 3 | Sales | XYZ |
| 3 | Sales | XYZ |
+-------------+-----------+------+
I want to output only the groups that have unique teams greater than 3.
Only group 2 has this condition so the output is :
Expected Output:
+-------------+-----------+------+
| GroupNumber | TeamName | Goal |
+-------------+-----------+------+
| 2 | Design | XYZ |
| 2 | Sales | XYZ |
| 2 | technical | XYZ |
| 2 | Support | XYZ |
+-------------+-----------+------+
not sure how to utilize this in subquery
SELECT count(Distinct(TeamName))
FROM mytable
group by [GroupNumber]
HAVING COUNT(Distinct[TeamName])>3

Simply put it in a Subquery:
select *
from mytable
where [GroupNumber] in
(
SELECT [GroupNumber]
FROM mytable
group by [GroupNumber]
HAVING COUNT(Distinct[TeamName])>3
)

Please try
SELECT *
FROM mytable where GroupNumber in (select GroupNumber
FROM mytable group by TeamName
HAVING COUNT(TeamName)>3)

Aggregation by positive/negative values v.2

I've posted several topics and every query had some problems :( Changed table and examples for better understanding
I have a table called PROD_COST with 5 fields
(ID,Duration,Cost,COST_NEXT,COST_CHANGE).
I need extra field called "groups" for aggregation.
Duration = number of days the price is valid (1 day=1row).
Cost = product price in this day.
-Cost_next = lead(cost,1,0).
Cost_change = Cost_next - Cost.
example:
+----+---------+------+-------------+-------+
|ID |Duration | Cost | Cost_change | Groups|
+----+---------+------+-------------+-------+
| 1 | 1 | 10 | -1,5 | 1 |
| 2 | 1 | 8,5 | 3,7 | 2 |
| 3 | 1 | 12.2 | 0 | 2 |
| 4 | 1 | 12.2 | -2,2 | 3 |
| 5 | 1 | 10 | 0 | 3 |
| 6 | 1 | 10 | 3.2 | 4 |
| 7 | 1 | 13.2 | -2,7 | 5 |
| 8 | 1 | 10.5 | -1,5 | 5 |
| 9 | 1 | 9 | 0 | 5 |
| 10 | 1 | 9 | 0 | 5 |
| 11 | 1 | 9 | -1 | 5 |
| 12 | 1 | 8 | 1.5 | 6 |
+----+---------+------+-------------+-------+
Now i need to group("Groups" field) by Cost_change. It can be positive,negative or 0 values.
Some kind guy advised me this query:
select id, COST_CHANGE, sum(GRP) over (order by id asc) +1
from
(
select *, case when sign(COST_CHANGE) != sign(isnull(lag(COST_CHANGE)
over (order by id asc),COST_CHANGE)) and Cost_change!=0 then 1 else 0 end as GRP
from PROD_COST
) X
But there is a problem: If there are 0 values between two positive or negative values than it groups it separately, for example:
+-------------+--------+
| Cost_change | Groups |
+-------------+--------+
| 9.262 | 5777 |
| -9.262 | 5778 |
| 9.262 | 5779 |
| 0.000 | 5779 |
| 9.608 | 5780 |
| -11.231 | 5781 |
| 10.000 | 5782 |
+-------------+--------+
I need to have:
+-------------+--------+
| Cost_change | Groups |
+-------------+--------+
| 9.262 | 5777 |
| -9.262 | 5778 |
| 9.262 | 5779 |
| 0.000 | 5779 |
| 9.608 | 5779 | -- Here
| -11.231 | 5780 |
| 10.000 | 5781 |
+-------------+--------+
In other words, if there's 0 values between two positive ot two negative values than they should be in one group, because Sequence: MINUS-0-0-MINUS - no rotation. But if i had MINUS-0-0-PLUS, than GROUPS should be 1-1-1-2, because positive valus is rotating with negative value.
Thank you for attention!
I'm Using Sql Server 2012

I think the best approach is to remove the zeros, do the calculation, and then re-insert them. So:
with pcg as (
select pc.*, min(id) over (partition by grp) as grpid
from (select pc.*,
(row_number() over (order by id) -
row_number() over (partition by sign(cost_change)
order by id
) as grp
from prod_cost pc
where cost_change <> 0
) pc
)
select pc.*, max(groups) over (order by id)
from prod_cost pc left join
(select pcg.*, dense_rank() over (order by grpid) as groups
from pcg
) pc
on pc.id = pcg.id;
The CTE assigns a group identifier based on the lowest id in the group, where the groups are bounded by actual sign changes. The subquery turns this into a number. The outer query then accumulates the maximum value, to give a value to the 0 records.

Create a group indicator (SQL)

I am looking to create a group indicator for a query using SQL (Oracle specifically). Basically, I am looking for duplicate entries for certain columns and while I can find those what I also want is some kind of indicator to say what rows the duplicates are from.
Below is an example of what I am looking to do (looking for duplicates on Name, Zip, Phone). The rows with Name = aaa are all in the same group, bb are not, and c are.
Is there even a way to do this? I was thinking something with OVER (PARTITION BY ... but I can't think of a way to only increment for each group.
+----------+---------+-----------+------------+-----------+-----------+
| Name | Zip | Phone | Amount | Duplicate | Group |
+----------+---------+-----------+------------+-----------+-----------+
| aaa | 1234 | 5555555 | 500 | X | 1 |
| aaa | 1234 | 5555555 | 285 | X | 1 |
| bb | 545 | 6666666 | 358 | | 2 |
| bb | 686 | 7777777 | 898 | | 3 |
| aaa | 1234 | 5555555 | 550 | X | 1 |
| c | 5555 | 8888888 | 234 | X | 4 |
| c | 5555 | 8888888 | 999 | X | 4 |
| c | 5555 | 8888888 | 230 | X | 4 |
+----------+---------+-----------+------------+-----------+-----------+

It looks like you can just use
(CASE WHEN COUNT(*) OVER (partition by name, zip, phone) > 1
THEN 'X'
ELSE NULL
END) duplicate,
DENSE_RANK() OVER (ORDER BY name, zip, phone) group_rank
Rows that have the same name, zip, and phone will have the same group_rank. Here is a SQL Fiddle example.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Duplicate value postgresql - sql

Related

Count Distinct Over Multiple Columns

SQL Multiple count columns with multiple conditionS

Having Groups based on distinct count of another column

Aggregation by positive/negative values v.2

Create a group indicator (SQL)

Categories

Resources