SQL Server Query grouping Optimization - sql

My I/P table is T1:
+----+------+----+------+
| Id | v1 |v2 |v3 |
+----+------+----+------+
| 1 | a |b |c |
| 2 | null |b |null |
| 3 | d |null|null |
| 4 | null |e |null |
| 5 | e |f |null |
+----+------+----+------+
My Requirement : I have to compare one row with the others on the basis of id's.If they have all the values same or null/empty then I have to club the values of id separated by commas.
Required output:
+----+---------------------+
| Id |v1 |v2 |v3 |
+----+---------------------+
| 1,2| a |b |c |
| 3,4| d |e |null |
| 5 | e |f |null |
+----+---------------------+
Please assist.I am trying to use while loop but it is taking me very long.
I want optimize solution as I have run the statement on large record set.

Related

How can I get row index in sql

Table Contents are:
|Emp_ID | Name |
|-------|-------|
|1 | xyz |
|23 | pqq |
|22 | wdd |
|12 | fdv |
Here I want the row index where Emp_ID is greater than 15.
Result should return 2 and 3

How to give an alias to an unnested array of tuples?

I'm trying to set up a set of data using this query :
WITH CTE AS (
SELECT *
FROM UNNEST([("a",NULL),("b",NULL),("c",1),("d",1),("e",2),("f",3),("g",4),("h",4),("i",4),("j",4),("k",5),("l",5),("m",6),("n",7),("o",7)])
)
SELECT *
FROM CTE
The yielded result is :
|Row|f0_|f1_ |
|---|---|----|
| 1 | a |null|
| 2 | b |null|
| 3 | c |1 |
| 4 | d |1 |
| 5 | e |2 |
| 6 | f |3 |
| 7 | g |4 |
| 8 | h |4 |
| 9 | i |4 |
| 10| j |4 |
| 11| k |5 |
| 12| l |5 |
| 13| m |6 |
| 14| n |7 |
| 15| o |7 |
What I want is :
|Row| x | y |
|---|---|----|
| 1 | a |null|
| 2 | b |null|
| 3 | c |1 |
| 4 | d |1 |
| 5 | e |2 |
| 6 | f |3 |
| 7 | g |4 |
| 8 | h |4 |
| 9 | i |4 |
| 10| j |4 |
| 11| k |5 |
| 12| l |5 |
| 13| m |6 |
| 14| n |7 |
| 15| o |7 |
Use STRUCT:
WITH CTE AS (
SELECT *
FROM UNNEST([STRUCT("a" as x,NULL as y),("b",NULL),("c",1),("d",1),("e",2),("f",3),("g",4),("h",4),("i",4),("j",4),("k",5),("l",5),("m",6),("n",7),("o",7)])
)
SELECT *
FROM CTE
You can use this trick with almost any SQL dialect:
WITH CTE AS (
SELECT NULL AS A, NULL AS B
FROM (SELECT 1) T
WHERE FALSE
UNION ALL
SELECT *
FROM UNNEST([
("a",NULL),("b",NULL),("c",1),("d",1),("e",2),("f",3),
("g",4),("h",4),("i",4),("j",4),("k",5),("l",5),("m",6),("n",7),("o",7)
])
)
SELECT *
FROM CTE;

PySpark or SQL: consuming coalesce

I'm trying to coalesce multiple input columns into multiple output columns in either a pyspark dataframe or sql table.
Each output column would contain the "first available" input value, and then "consume" it so the input value is unavailable for following output columns.
+----+-----+-----+-----+-----+-----+---+------+------+------+
| ID | in1 | in2 | in3 | in4 | in5 | / | out1 | out2 | out3 |
+----+-----+-----+-----+-----+-----+---+------+------+------+
| 1 | | | C | | | / | C | | |
| 2 | A | | C | | E | / | A | C | E |
| 3 | A | B | C | | | / | A | B | C |
| 4 | A | B | C | D | E | / | A | B | C |
| 5 | | | | | | / | | | |
| 6 | | B | | | E | / | B | E | |
| 7 | | B | | D | E | / | B | D | E |
+----+-----+-----+-----+-----+-----+---+------+------+------+
What's the best way to do this?
edit: clarification - in1, in2, in3, etc.. can be any value
Here is the way.
import pyspark.sql.functions as f
df = spark.read.option("header","true").option("inferSchema","true").csv("test.csv")
cols = df.columns
cols.remove('ID')
df2 = df.withColumn('ins', f.array_except(f.array(*cols), f.array(f.lit(None))))
for i in range(0, 3):
df2 = df2.withColumn('out' + str(i+1), f.col('ins')[i])
df2.show(10, False)
+---+----+----+----+----+----+---------------+----+----+----+
|ID |in1 |in2 |in3 |in4 |in5 |ins |out1|out2|out3|
+---+----+----+----+----+----+---------------+----+----+----+
|1 |null|null|C |null|null|[C] |C |null|null|
|2 |A |null|C |null|E |[A, C, E] |A |C |E |
|3 |A |B |C |null|null|[A, B, C] |A |B |C |
|4 |A |B |C |D |E |[A, B, C, D, E]|A |B |C |
|5 |null|null|null|null|null|[] |null|null|null|
|6 |null|B |null|null|E |[B, E] |B |E |null|
|7 |null|B |null|D |E |[B, D, E] |B |D |E |
+---+----+----+----+----+----+---------------+----+----+----+

SQL Server 2008:: Efficient way to do the following query

I have the following data:
Input:
----------------------------
| Id | Value|
----------------------------
| 1 |A |
| 1 |B |
| 2 |C |
| 2 |D |
| 2 |E |
| 3 |F |
----------------------------
I need to convert the results to the following:
Output (Count is based on Id)
----------------------------
| Id | Value| Count|
----------------------------
| 1 |A | 2 |
| 1 |B | 2 |
| 2 |C | 3 |
| 2 |D | 3 |
| 2 |E | 3 |
| 3 |F | 1 |
----------------------------
I am using SQL server 2008. Is it possible to write a query to do this?
If yes could anyone help me provide a SQL to obtain the above output from the input data I gave.
You are looking for COUNT OVER:
select id, value, count(*) over (partition by id)
from mytable
order by id, value;

Group by records by date

I am using SQL Server 2008 R2. I am having a database table like below :
+--+-----+---+---------+--------+----------+-----------------------+
|Id|Total|New|Completed|Assigned|Unassigned|CreatedDtUTC |
+--+-----+---+---------+--------+----------+-----------------------+
|1 |29 |1 |5 |6 |5 |2014-01-07 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|2 |29 |1 |5 |6 |5 |2014-01-07 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|3 |29 |1 |5 |6 |5 |2014-01-07 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|4 |30 |1 |3 |2 |3 |2014-01-08 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|5 |30 |0 |3 |4 |3 |2014-01-09 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|6 |30 |0 |0 |0 |0 |2014-01-10 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|7 |30 |0 |0 |0 |0 |2014-01-11 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
Now, I am facing a strange problem while grouping the records by CreatedDtUTC column.
I want the distinct records from this table. Here you can observe that the first three records are duplicates created at the same date time. I want the distinct records so I had ran the query given below :
SELECT Id, Total, New, Completed, Assigned, Unassigned, MAX(CreatedDtUTC)
FROM TblUsage
GROUP BY CreatedDtUTC
But it gives me error :
Column 'TblUsage.Id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I also have tried DISTINCT for CreatedDtUTC column, but had given the same error. Can anyone let me know how to get rid of this?
P.S. I want the CreatedDtUTC coumn in CONVERT(VARCHAR(10), CreatedDtUTC,101) format.
Try this............
SELECT min(Id) Id, Total, New, Completed, Assigned, Unassigned, CreatedDtUTC
FROM TblUsage
GROUP BY Total, New, Completed, Assigned, Unassigned, CreatedDtUTC
The error message itself is very explicit. You can't put a column without applying an aggregate function to it into SELECT clause if it's not a part of GROUP BY. And the reason behind is very simple SQL Server doesn't know which value for that column within a group you want to select. It's not deterministic and therefore prohibited.
You can either put all the columns besides Id in GROUP BY and use MIN() or MAX() on Id or you can leverage windowing function ROW_NUMBER() in the following way
SELECT Id, Total, New, Completed, Assigned, Unassigned, CONVERT(VARCHAR(10), CreatedDtUTC,101) CreatedDtUTC
FROM
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY Total, New, Completed, Assigned, Unassigned, CreatedDtUTC
ORDER BY id DESC) rnum
FROM TblUsage t
) q
WHERE rnum = 1
Output:
| ID | TOTAL | NEW | COMPLETED | ASSIGNED | UNASSIGNED | CREATEDDTUTC |
|----|-------|-----|-----------|----------|------------|--------------|
| 3 | 29 | 1 | 5 | 6 | 5 | 01/07/2014 |
| 6 | 30 | 0 | 0 | 0 | 0 | 01/10/2014 |
| 7 | 30 | 0 | 0 | 0 | 0 | 01/11/2014 |
| 5 | 30 | 0 | 3 | 4 | 3 | 01/09/2014 |
| 4 | 30 | 1 | 3 | 2 | 3 | 01/08/2014 |
Here is SQLFiddle demo
Try this:
SELECT MIN(Id) AS Id, Total, New, Completed, Assigned, Unassigned,
CONVERT(VARCHAR(10), CreatedDtUTC, 101) AS CreatedDtUTC
FROM TblUsage
GROUP BY Total, New, Completed, Assigned, Unassigned, CreatedDtUTC
Check the SQL FIDDLE DEMO
OUTPUT
| ID | TOTAL | NEW | COMPLETED | ASSIGNED | UNASSIGNED | CREATEDDTUTC |
|----|-------|-----|-----------|----------|------------|--------------|
| 1 | 29 | 1 | 5 | 6 | 5 | 01/07/2014 |
| 4 | 30 | 1 | 3 | 2 | 3 | 01/08/2014 |
| 5 | 30 | 0 | 3 | 4 | 3 | 01/09/2014 |
| 6 | 30 | 0 | 0 | 0 | 0 | 01/10/2014 |
| 7 | 30 | 0 | 0 | 0 | 0 | 01/11/2014 |