SQL to show 1 for unique and 0 for reapeat - sql

looking for a quick solution on SQL...
I used to have a clunky formula in excel: =IF(COUNTIF($C$2:C2,C2)>1,0,COUNTIF($C$2:C2,C2)) to print 1 for unique item and 0 for a repeat.
Then moved to =1-(C1-C2) and that kinda did the job... Not an accurate one Now looking for an SQL that could do a similar job... The example below for result needed:
NUMBER UNIQUE
6573455300000 1
6573455300000 0
6573455300000 0
6573455300000 0
6573411981080 1
6573411981080 0
6573411981080 0
6573411981080 0
Does anyone know any kind of code to achieve this?

using row_number():
select
col
, [first] = case when row_number() over (partition by col order by (select 1)) > 1 then 0 else 1 end
from t
rextester demo: http://rextester.com/FWA89661
returns:
+---------------+-------+
| col | first |
+---------------+-------+
| 6573411981080 | 1 |
| 6573411981080 | 0 |
| 6573411981080 | 0 |
| 6573411981080 | 0 |
| 6573455300000 | 1 |
| 6573455300000 | 0 |
| 6573455300000 | 0 |
| 6573455300000 | 0 |
+---------------+-------+

Use window functions. In your case, you seem to want the first row and mark that, so row_number() looks like the solution:
select t.*,
(case when row_number() over (partition by number order by ?) = 1
then 1 else 0 end
end) as flag
from t;
The ? is for the column that specifies the ordering (which is first). If you want just one row but don't care which, then you can use order by number or order by (select null).
UNIQUE is a SQL keyword (think "unique index"), so it is a bad name for a column. That is why I changed to the generic flag, although you might prefer first_row_flag or something like that.

SELECT
[number],
case when rown = 1 then 1 else 0 end as [unique]
FROM
(
SELECT
[number], row_number() OVER(partition by [number] order by [number]) as rown
FROM
t
) a
This doesn't strictly have to be done using a subquery but it's unlikely to make any difference to the overall performance, so it's arranged like this to help you see what is going on. If you run just the inner subquery in isolation you'll see that the most important work is done by row_number; essentially the data is partitioned into buckets based on the value of [number] something like a group by, but it doesn't suppress repeated values. Within the partition each occurrence of [number] is numbered with an incrementing counter. When a different value of [number] is encountered the numbering restarts from 1. The order by clause is just there because sql server demands you have one, and we don't know anything else about your table but if there's something else about your data where one of these occurrences would be more ideal to single out to be labelled with [unique]=1, try and find a way to make it so that row is sorted into position 1; a typical use of this pattern is "latest record" in which case the order by part would be [datecolumn] DESC
Once you have an increment of counter per number that resets itself, all we need to do is use a standard case / else statement to make it a 1 when it's 1 otherwise 0 to match your result desired

select t.Number,case when t.num=1 then t.num else 0 end [Unique] from(
select Number,row_number() over (partition by number order by number) num from MyTbl)t
order by t.Number

Related

How to create a table to count with a conditional

I have a database with a lot of columns with pass, fail, blank indicators
I want to create a function to count each type of value and create a table from the counts. The structure I am thinking is something like
| Value | x | y | z |
|-------|------------------|-------------------|---|---|---|---|---|---|---|
| pass | count if x=pass | count if y=pass | count if z=pass | | | | | | |
| fail | count if x=fail | count if y=fail |count if z=fail | | | | | | |
| blank | count if x=blank | count if y=blank | count if z=blank | | | | | | |
| total | count(x) | count(y) | count (z) | | | | | | |
where x,y,z are columns from another table.
I don't know which could be the best approach for this
thank you all in advance
I tried this structure but it shows syntax error
CREATE FUNCTION Countif (columnx nvarchar(20),value_compare nvarchar(10))
RETURNS Count_column_x AS
BEGIN
IF columnx=value_compare
count(columnx)
END
RETURN
END
Also, I don't know how to add each count to the actual table I am trying to create
Conditional counting (or any conditional aggregation) can often be done inline by placing a CASE expression inside the aggregate function that conditionally returns the value to be aggregated or a NULL to skip.
An example would be COUNT(CASE WHEN SelectMe = 1 THEN 1 END). Here the aggregated value is 1 (which could be any non-null value for COUNT(). (For other aggregate functions, a more meaningful value would be provided.) The implicit ELSE returns a NULL which is not counted.
For you problem, I believe the first thing to do is to UNPIVOT your data, placing the column name and values side-by-side. You can then group by value and use conditional aggregation as described above to calculate your results. After a few more details to add (1) a totals row using WITH ROLLUP, (2) a CASE statement to adjust the labels for the blank and total rows, and (3) some ORDER BY tricks to get the results right and we are done.
The results may be something like:
SELECT
CASE
WHEN GROUPING(U.Value) = 1 THEN 'Total'
WHEN U.Value = '' THEN 'Blank'
ELSE U.Value
END AS Value,
COUNT(CASE WHEN U.Col = 'x' THEN 1 END) AS x,
COUNT(CASE WHEN U.Col = 'y' THEN 1 END) AS y
FROM #Data D
UNPIVOT (
Value
FOR Col IN (x, y)
) AS U
GROUP BY U.Value WITH ROLLUP
ORDER BY
GROUPING(U.Value),
CASE U.Value WHEN 'Pass' THEN 1 WHEN 'Fail' THEN 2 WHEN '' THEN 3 ELSE 4 END,
U.VALUE
Sample data:
x
y
Pass
Pass
Pass
Fail
Pass
Fail
Sample results:
Value
x
y
Pass
3
1
Fail
1
1
Blank
0
2
Total
4
4
See this db<>fiddle for a working example.
I think you don't need a generic solution like a function with value as parameter.
Perhaps, you could create a view grouping your data and after call this view filtering by your value.
Your view body would be something like that
select value, count(*) as Total
from table_name
group by value
Feel free to explain your situation better so I could help you.
You can do this by grouping by the status column.
select status, count(*) as total
from some_table
group by status
Rather than making a whole new table, consider using a view. This is a query that looks like a table.
create view status_counts as
select status, count(*) as total
from some_table
group by status
You can then select total from status_counts where status = 'pass' or the like and it will run the query.
You can also create a "materialized view". This is like a view, but the results are written to a real table. SQL Server is special in that it will keep this table up to date for you.
create materialized view status_counts with distribution(hash(status))
select status, count(*) as total
from some_table
group by status
You'd do this for performance reasons on a large table which does not update very often.

SQL/Postgres - collapse every N rows into 1 based on row position in group

I have a set of ordered results from a Postgres table, where every group of 4 rows represents a set of related data. I want to process this set of results further, so that every group of 4 rows are collapsed into 1 row with aliased column names where the value for each column is based on that row's position in the group - I'm close, but I can't quite get the query right (nor am I confident that I'm approaching this in the optimal manner). Here's the scenario:
I am collecting survey results - each survey has 4 questions, but each answer is stored in a separate row in the database. However, they are associated with each other by a submission event_id, and the results are guaranteed to be returned in a fixed order. A set of survey_results will look something like:
event_id | answer
----------------------------
a | 10
a | foo
a | 9
a | bar
b | 2
b | baz
b | 4
b | zip
What I would like to be able to do is query this result so that the final output comes out with each set of 4 results on their own line, with aliased column names.
event_id | score_1 | reason_1 | score_2 | reason_2
----------------------------------------------------------
a | 10 | foo | 9 | bar
b | 2 | baz | 4 | zip
The closest that I've been able to get is
SELECT survey_answers.event_id,
(SELECT survey_answers.answer FROM survey_answers FETCH NEXT 1 ROWS ONLY) AS score_1,
(SELECT survey_answers.answer FROM survey_answers OFFSET 1 ROWS FETCH NEXT 1 ROWS ONLY) AS reason_1
(SELECT survey_answers.answer FROM survey_answers OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY) AS score_2,
(SELECT survey_answers.answer FROM survey_answers OFFSET 3 ROWS FETCH NEXT 1 ROWS ONLY) AS reason_2
FROM survey_answers
GROUP BY survey_answers.event_id
But this, understandably, returns the correct number of rows, but with the same values (other than event_id):
event_id | score_1 | reason_1 | score_2 | reason_2
----------------------------------------------------------
a | 10 | foo | 9 | bar
b | 10 | foo | 9 | bar
How can I structure my query so that it applies the OFFSET/FETCH behaviors every batch of 4 rows, or, maybe more accurately, within every unique set of event_ids?
demo: db<>fiddle
First of all, this looks like a very bad design:
There is no guaranteed order! Databases store their data in random order and call them in random order. You really need a order column. In this small case this might work for accident.
You should generate two columns, one for score, one for reason. Mix up the types is not a good idea.
Nevertheless for this simple and short example this could be a solution (remember this is not recommended for productive tables):
WITH data AS (
SELECT
*,
row_number() OVER (PARTITION BY event_id) -- 1
FROM
survey_results
)
SELECT
event_id,
MAX(CASE WHEN row_number = 1 THEN answer END) AS score_1, -- 2
MAX(CASE WHEN row_number = 2 THEN answer END) AS reason_1,
MAX(CASE WHEN row_number = 3 THEN answer END) AS score_2,
MAX(CASE WHEN row_number = 4 THEN answer END) AS reason_2
FROM
data
GROUP BY event_id
The row_number() window function adds a row count for each event_id. In this case from 1 to 4. This can be used to identify the types of answer (see intermediate step in fiddle). In productive code you should use some order column to ensure the order. Then the window function would look like PARTITION BY event_id ORDER BY order_column
This is a simple pivot on event_id and the type id (row_number) which does exactly what you expect
You need a column that specifies the ordering. In your case, that should probably be a serial column, which is guaranteed to be increasing for each insert. I would call such a column survey_result_id.
With such a column, you can do:
select event_id,
max(case when seqnum = 1 then answer end) as score_1,
max(case when seqnum = 2 then answer end) as reason_1,
max(case when seqnum = 3 then answer end) as score_2,
max(case when seqnum = 4 then answer end) as reason_2
from (select sr.*,
row_number() over (partition by event_id order by survey_result_id) as seqnum
from survey_results sr
) sr
group by event_id;
Without such a column, you cannot reliably do what you want, because SQL tables represent unordered sets.

Group by query based on other column value

Suppose I have a table named process_states:
+--------+--------+
|Process | State |
-------------------
| A | 0 |
| A | 0 |
| B | 0 |
| B | -1 |
| C | -99 |
-------------------
Note: State can have many more negative and positive state values
I want to find all processes having all rows with state 0. In the above case I want to get A.
I am trying to do it using group by, is there a way to do something like this:
select process from process_states
group by process
having <all state for that process is 0>
Is it possible to do it using group by?
You could use a group by clause and filter the processes with a having clause:
SELECT process
FROM process_states
GROUP BY process
HAVING COUNT(CASE state WHEN -1 THEN 1 END) = 0
EDIT:
Given the clarifying comments on other answers, if the requirement is to find only processes that only have 0 states, you could count the total number of rows and the number of rows with a 0 state and compare them:
SELECT process
FROM process_states
GROUP BY process
HAVING COUNT(CASE state WHEN 0 THEN 1 END) = COUNT(*)
You can group by process column and check if the max and min of the state column is 0.
select process
from process_states
group by process
having max(state) = min(state) and min(state) = 0
You can do it by using the having clause. If the count of the rows for a given process is equal to the count of rows with zeroes, then it means that the process only has rows with zeroes.
select process
from process_states
group by process
having count(*) = count(case when state = 0 then 'X' end)
I would be tempted to use min for this:
select process
from process_status
group by process
having min(status) > -1;
This does assume that there are no other negative status.
Try this
select process
from process_status
group by process
having sum(abs(status)) = 0

How to get max of multiple columns in oracle

Here is a sample table:
| customer_token | created_date | orders | views |
+--------------------------------------+------------------------------+--------+-------+
| 93a03e36-83a0-494b-bd68-495f54f406ca | 10-NOV-14 14.41.09.000000000 | 1 | 0 |
| 93a03e36-83a0-494b-bd68-495f54f406ca | 20-NOV-14 14.41.47.000000000 | 0 | 1 |
| 93a03e36-83a0-494b-bd68-495f54f406ca | 26-OCT-14 16.14.30.000000000 | 2 | 0 |
| 93a03e36-83a0-494b-bd68-495f54f406ca | 11-OCT-14 16.31.11.000000000 | 0 | 2 |
In this customer data table I store all of the dates when a given customer has placed an order, or viewed a product. Now, for a report, I want to write a query where for each customer (auth_token), I want to generate the last_order_date (row where orders > 0) and last_view_date (row where product_views > 0).
I am looking for an efficient query as I have millions of records.
select customer_token,
max(case when orders > 0 then created_date else NULL end),
max(case when views > 0 then created_date else NULL end)
from Customer
group by customer_token;
Update: This query is quite efficient because Oracle is likely to scan the table only once. Also there is an interesting thing with grouping - when you use GROUP BY a select list can only contain columns which are in the GROUP BY or aggregate functions. In this query MAX is calculated for the column created_date, but you don't need to put orders and views in a GROUP BY because they are in the expression inside MAX function. It's not very common.
When you want to get the largest value from a row, you need to use the MAX() aggregate function. It is also best practice to group a column when you are using aggregate functions.
In this case, you want to group by customer_token. That way, you'll receive one row per group, and the aggregate function will give you the value for that group.
However, you only want to see the dates where the cell value is greater than 0, so I recommend you put a case statement inside your MAX() function like this:
SELECT customer_token,
MAX(CASE WHEN orders > 0 THEN created_date ELSE NULL END) AS latestOrderDate,
MAX(CASE WHEN views > 0 THEN created_date ELSE NULL END) AS latestViewDate
FROM customer
GROUP BY customer_token;
This will give you the max date only when orders is positive, and only when views is positive. Without that case statement, the DBMS won't know which groups to give you, and you would likely get incorrect results.
Here is an oracle reference for aggregate functions.

Setting rank to NULL using RANK() OVER in SQL

In a SQL Server DB, I have a table of values that I am interested in ranking.
When I perform a RANK() OVER (ORDER BY VALUE DESC) as RANK, I get the following results (in a hypothetical table):
RANK | USER_ID | VALUE
------------------------
1 | 33 | 30000
2 | 10 | 20000
3 | 45 | 10000
4 | 12 | 5000
5 | 43 | 2000
6 | 32 | NULL
6 | 13 | NULL
6 | 19 | NULL
6 | 28 | NULL
The problem is, I do not want the rows which have NULL for a VALUE to get a rank - I need some way to set the rank for these to NULL. So far, searching the web has brought me no answers on how I might be able to do this.
Thanks for any help you can provide.
You can try a CASE statement:
SELECT
CASE WHEN Value IS NULL THEN NULL
ELSE RANK() OVER (ORDER BY VALUE DESC)
END AS RANK,
USER_ID,
VALUE
FROM yourtable
The CASE statement provided earlier would count the NULL records in the rank if the SORT BY was ascending rather than descending. This would start the ranking at 5 rather than 1 - probably not what is desired.
To ensure that the nulls do not get counted in the rank, you can force them to the bottom by adding an initial sort criteria on whether the value IS NULL or not, like so:
SELECT
CASE WHEN Value IS NULL THEN NULL
ELSE RANK() OVER
(ORDER BY CASE WHEN Value IS NULL THEN 1 ELSE 0 END, VALUE DESC)
END AS RANK,
USER_ID,
VALUE
FROM yourtable
*** credit to Hugo Kornelis: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/deb8a0aa-aaab-442b-a667-11220333a4e0/rank-without-counting-null-values?forum=transactsql