In my SQL Server database, I have a table with many duplicate values and I need to fetch results with distinct columns EID and YEAR and select rows containing fewer NULL values or order the table and get a final DISTINCT column EID and YEAR rows.
For example: below the table with EID = E138442 and YEAR = 2019 occurs 21 times were in this duplicate the row containing fewer null values should be fetched
+---------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| EID | YEAR | JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV | DEC |
+---------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| E050339 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 |
| E050339 | 2020 | NULL | 6 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| E050339 | 2020 | 13 | 6 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| E138348 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 | NULL |
| E138348 | 2019 | NULL | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| e138372 | 2019 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| E138440 | 2019 | NULL | NULL | 2 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 4 | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 7 | 4 | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 7 | 4 | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | 7 | 7 | 4 | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | NULL | 7 | 7 | 4 | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | NULL | 2 | 7 | 7 | 4 | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | NULL | 7 | 2 | 7 | 7 | 4 | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | NULL | 7 | 7 | 2 | 7 | 7 | 4 | 4 | 9 | 5 |
| E138442 | 2019 | NULL | NULL | 1 | 7 | 7 | 2 | 7 | 7 | 4 | 4 | 9 | 5 |
| E138442 | 2019 | NULL | 1 | NULL | 7 | 7 | 2 | 7 | 7 | 4 | 4 | 9 | 5 |
| E138442 | 2020 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 |
| E138442 | 2020 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 |
| E138442 | 2020 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 |
| E138442 | 2020 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 |
+---------+------+------+------+------+------+------+------+------+------+------+------+------+------+
I need a SQL query to fetch values as shown here:
+---------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| EID | YEAR | JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV | DEC |
+---------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| E050339 | 2020 | 13 | 6 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| E138348 | 2019 | NULL | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| e138372 | 2019 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| E138440 | 2019 | NULL | NULL | 2 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| E138442 | 2019 | NULL | 1 | NULL | 7 | 7 | 2 | 7 | 7 | 4 | 4 | 9 | 5 |
| E138442 | 2020 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 |
+---------+------+------+------+------+------+------+------+------+------+------+------+------+------+
The result table should have a final row with distinct columns EID and YEAR.
SELECT *
FROM TABLE_NAME C1
WHERE EXISTS (SELECT 1
FROM TABLE_NAME C2
WHERE C1.EID = C2.EID AND C1.YEAR = C2.YEAR
HAVING COUNT(*) = 1)
ORDER BY
c1.EID, c1.YEAR, c1.JAN, c1.FEB, c1.MAR, c1.APR,
c1.MAY, c1.JUN, c1.JUL, c1.AUG, c1.SEP, c1.OCT, c1.NOV, c1.DEC ASC;
I tried the above code but found irrelevant results
since you have no other way to distinguish members of a group and based on "select rows containing fewer NULL values " here is one way how you can do it by using ctes, its not clean but probably the only way:
with cte as (
SELECT *,
ISNULL(c1.JAN, 1) + ISNULL(c1.FEB,1) + ... + ISNULL(c1.DEC,1) AS NullCount
FROM
tablename
)
, cte2 as (
select EID , YEAR , min(NullCount) min_nullcount
from cte
group by EID , YEAR
)
select t.*
from
cte t
join cte2 tt
on t.EID = tt.EID
and t.YEAR = tt.YEAR
and t.NULLCount = tt.min_nullcount
If you have duplicate minimum null per group you can use query below :
select * from (
SELECT *,
ROW_NUMBER OVER (partition by EID , YEAR order by ISNULL(c1.JAN, 1) + ... + ISNULL(c1.DEC,1) AS rnk
FROM
tablename
) xx
WHERE rnk = 1
Related
I have the following basic 3 Table Structure in mariadb/mysql.
MariaDB [aix_registry]> describe nodes;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(256) | NO | | NULL | |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.036 sec)
MariaDB [aix_registry]> describe attribs;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(256) | NO | | NULL | |
| persistent | int(11) | YES | | 0 | |
| parent | varchar(256) | YES | | NODE | |
+------------+--------------+------+-----+---------+----------------+
4 rows in set (0.042 sec)
MariaDB [aix_registry]> describe entries;
+-----------+--------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| node_id | int(11) | NO | MUL | NULL | |
| attrib_id | int(11) | NO | MUL | NULL | |
| value | varchar(256) | NO | | NULL | |
| ts | timestamp | NO | | current_timestamp() | |
+-----------+--------------+------+-----+---------------------+----------------+
5 rows in set (0.052 sec)
This simple SELECT returns incomplete records. I reduced the output of all follwing examples to a single dataset to avoid unnecessary clutter.
SELECT nodes.id AS NODE_ID, nodes.name AS NODE ,
MAX(CASE WHEN attribs.name = 'IP_LONG' THEN value END) AS IP_LONG,
MAX(CASE WHEN attribs.name = 'IP' THEN value END) AS IP,
MAX(CASE WHEN attribs.name = 'LOCATION' THEN value END) AS LOCATION
from entries left join nodes on nodes.id = node_id left join attribs on attribs.id = attrib_id WHERE entries.ts > DATE_SUB(NOW(), INTERVAL 1 DAY) GROUP BY nodes.name ORDER BY nodes.id ;
+---------+-------------+-------------------------------------------+--------------+------------+
| NODE_ID | NODE | IP_LONG | IP | LOCATION |
+---------+-------------+-------------------------------------------+--------------+------------+
| 31 | AIXDX4-TEST | 172.17.9.196/255.255.248.0/172.17.15.255/ | 172.17.9.196 | Wienerberg |
+---------+-------------+-------------------------------------------+--------------+------------+
The IP_LONG column is missing the follwing for example...
172.16.84.74/255.255.192.0/172.16.127.255/aixdx4-test.domain.org
My guess is, it has something to do with the MAX() Function has troubles with mixed Content in the Value Column. When leaving out MAX() and GROUP BY the missing Values are shown but Output is kind of chaotic.
SELECT nodes.id AS NODE_ID, nodes.name AS NODE,
CASE WHEN attribs.name = 'IP_LONG' THEN value END AS IP_LONG,
CASE WHEN attribs.name = 'IP' THEN value END AS IP,
CASE WHEN attribs.name = 'LOCATION' THEN value END AS LOCATION
from entries left join nodes on nodes.id = node_id left join attribs on attribs.id = attrib_id
WHERE entries.ts > DATE_SUB(NOW(), INTERVAL 1 DAY) ORDER BY nodes.id;
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | 172.17.9.196 | NULL |
| 31 | AIXDX4-TEST | 172.17.9.196/255.255.248.0/172.17.15.255/ | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | 172.16.84.74 | NULL |
| 31 | AIXDX4-TEST | 172.16.84.74/255.255.192.0/172.16.127.255/aixdx4-test.domain.org | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | 172.16.13.196 | NULL |
| 31 | AIXDX4-TEST | 172.16.13.196/255.255.254.0/172.16.13.255/ | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | Wienerberg |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL | NULL | NULL |
| 31 | AIXDX4-TEST | NULL
This Query gives the right Output, but i am unclear how to integrate that in the above, but this is a Topic for another question.
SELECT nodes.id AS NODE_ID, nodes.name AS NODE, entries.value
AS IP_NETMASK_BROADCAST_DNS
FROM (entries LEFT JOIN nodes ON(nodes.id = entries.node_id))
WHERE entries.attrib_id = (SELECT attribs.id FROM attribs WHERE attribs.name = 'IP_LONG') AND CAST(entries.ts AS date) = curdate() AND nodes.id = '31' ORDER BY nodes.name;
+---------+-------------+------------------------------------------------------------------+
| NODE_ID | NODE | IP_NETMASK_BROADCAST_DNS |
+---------+-------------+------------------------------------------------------------------+
| 31 | AIXDX4-TEST | 172.17.9.196/255.255.248.0/172.17.15.255/ |
| 31 | AIXDX4-TEST | 172.16.84.74/255.255.192.0/172.16.127.255/aixdx4-test.domain.org |
| 31 | AIXDX4-TEST | 172.16.13.196/255.255.254.0/172.16.13.255/ |
+---------+-------------+------------------------------------------------------------------+
These are the 3 values you are getting a max() of
172.17.9.196/255.255.248.0/172.17.15.255/
172.16.84.74/255.255.192.0/172.16.127.255/aixdx4-test.domain.org
172.16.13.196/255.255.254.0/172.16.13.255/
These are strings -- max looks at a "alphabetical" max. Since the "9" is greater than the 8 and the 1 the first one 172.17.9.196/255.255.248.0/172.17.15.255/ is picked. These values are all different -- which do you want and why? Do you want the longest one? The longest one would require different code.
The title may sound a little confusing but essentially what I am trying to do is I'm trying to pull from the table below. The query I used to create the table below was
select d.FOLDER_ID, d.PKG_ID, p.file_id, d.PKG_START_TIME, d.PKG_END_TIME, d.ISVALID, d.VALIDFILE_COUNT, d.INVALIDFILE_COUNT, d.ISLOADED, d.LOADEDFILE_COUNT, d.REJECTEDFILE_COUNT
from DATA_EXCHANGE_PACKAGE d full outer join PACKAGE_FILE p on d.PKG_ID = p.PKG_ID
where d.pkg_id = p.pkg_id
order by PKG_START_TIME asc
This table contains data from two different tables as you can see in the query and it selects the first records based on package start time.
What I am trying to achieve is I want a query which can choose the amount of pkg_id's to return but I want every file_Id chosen for the amount of pkg_id chosen. For example, in my database I may have 100 packages but I only want to choose every file_Id for the first 10 packages. How do I do this. I've only been able to choose the first 5 records using top and choose just 5 distinct pkg_id rows but not every file_id for those distinct 5 pkg_ID's. Any help would be appreciated. I understand group by and partition may work to achieve what I want but I haven't been successful. I'm not the greatest at SQL so this is why i'm struggling, I thought this query was going to be easier to create. I'm also certain the where statement is pointless but I kept it regardless.
Also let's assume the folder_Id is always 1.
+-----------+--------+---------+-------------------------+-------------------------+---------+-----------------+-------------------+----------+------------------+--------------------+
| FOLDER_ID | PKG_ID | file_id | PKG_START_TIME | PKG_END_TIME | ISVALID | VALIDFILE_COUNT | INVALIDFILE_COUNT | ISLOADED | LOADEDFILE_COUNT | REJECTEDFILE_COUNT |
+-----------+--------+---------+-------------------------+-------------------------+---------+-----------------+-------------------+----------+------------------+--------------------+
| 1 | 1 | 1 | 2019-11-19 14:59:24.343 | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 1 | 2 | 2 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 2 | 3 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 2 | 4 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 2 | 5 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 2 | 6 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 2 | 7 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 2 | 8 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 2 | 9 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 2 | 10 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 2 | 11 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 3 | 12 | 2019-11-19 15:58:26.733 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 3 | 13 | 2019-11-19 15:58:26.733 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 3 | 14 | 2019-11-19 15:58:26.733 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 3 | 15 | 2019-11-19 15:58:26.733 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 3 | 16 | 2019-11-19 15:58:26.733 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 3 | 17 | 2019-11-19 15:58:26.733 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 3 | 18 | 2019-11-19 15:58:26.733 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 3 | 19 | 2019-11-19 15:58:26.733 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 3 | 20 | 2019-11-19 15:58:26.733 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
| 1 | 3 | 21 | 2019-11-19 15:58:26.733 | NULL | 1 | 10 | 0 | NULL | NULL | NULL |
An example of what I want to achieve with the above data is only want to choose the first two distinct pkg_id's based on the pkg_start_time in ascending order. However when only choosing those two distinct pkg_id's I want every file_id for those pkg_id's outputted. The below table is what I want my query to select from the above table.
+-----------+--------+---------+-------------------------+--------------+---------+-----------------+-------------------+----------+------------------+--------------------+--------+
| FOLDER_ID | PKG_ID | file_id | PKG_START_TIME | PKG_END_TIME | ISVALID | VALIDFILE_COUNT | INVALIDFILE_COUNT | ISLOADED | LOADEDFILE_COUNT | REJECTEDFILE_COUNT | seqnum |
+-----------+--------+---------+-------------------------+--------------+---------+-----------------+-------------------+----------+------------------+--------------------+--------+
| 1 | 1 | 1 | 2019-11-19 14:59:24.343 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 |
| 1 | 2 | 2 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL | 1 |
| 1 | 2 | 3 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL | 2 |
| 1 | 2 | 4 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL | 3 |
| 1 | 2 | 5 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL | 4 |
| 1 | 2 | 6 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL | 5 |
| 1 | 2 | 7 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL | 6 |
| 1 | 2 | 8 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL | 7 |
| 1 | 2 | 9 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL | 8 |
| 1 | 2 | 10 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL | 9 |
| 1 | 2 | 11 | 2019-11-19 15:10:20.157 | NULL | 1 | 10 | 0 | NULL | NULL | NULL | 10 |
+-----------+--------+---------+-------------------------+--------------+---------+-----------------+-------------------+----------+------------------+--------------------+--------+
Edit: I have solved my question
I have no idea why you are using a full join, so I'm replacing it with an inner join. You want row_number():
select dp.*
from (select d.FOLDER_ID, d.PKG_ID, p.file_id, d.PKG_START_TIME,
d.PKG_END_TIME, d.ISVALID, d.VALIDFILE_COUNT,
d.INVALIDFILE_COUNT, d.ISLOADED, d.LOADEDFILE_COUNT,
d.REJECTEDFILE_COUNT,
row_number() over (partition by d.pkg_id order by p.file_id) as seqnum
from DATA_EXCHANGE_PACKAGE d inner join
PACKAGE_FILE p
on d.PKG_ID = p.PKG_ID
where d.pkg_id = p.pkg_id
) dp
where seqnum <= 10
order by PKG_START_TIME asc
I have solved the question I was asking. The query i have made which has solved my question is below.
select d.FOLDER_ID, d.PKG_ID, p.file_id, d.PKG_START_TIME, d.PKG_END_TIME, d.ISVALID, d.VALIDFILE_COUNT, d.INVALIDFILE_COUNT, d.ISLOADED, d.LOADEDFILE_COUNT, d.REJECTEDFILE_COUNT
from DATA_EXCHANGE_PACKAGE d full outer join PACKAGE_FILE p on d.PKG_ID = p.PKG_ID
where d.PKG_ID = p.PKG_ID and d.PKG_ID > (select max(d.PKG_ID) - 5 from DATA_EXCHANGE_PACKAGE d ) and d.FOLDER_ID = 1
order by PKG_START_TIME desc
This query will basically iterate through the table and select every record until 5 distinct pkg_id's have been chosen. I'm going to use this query in Python and set where that 5 value is as a parameter so users can choose the amount of packages they want to return. I could also instead of using max(d.PKG_Id). In this table every new package ID will be a value higher than the previous package ID so I could probably also use datetime and max but this query is good enough for now. Also the folder_ID value will be a parameter.
I want join these row that date and time same but second are different. i use max function but it did not return me proper result.
| DateandTime | t1 | t2 | t3 | t4 | t5 | t6 | t7 |
|-------------------------|------|------|------|------|------|------|------|
| 2019-08-06 15:44:04.000 | NULL | 0 | 29 | 20 | 150 | 20 | 20 |
| 2019-08-06 15:44:03.000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL |
| 2019-08-06 15:43:04.000 | NULL | NULL | 29 | 20 | 150 | 20 | 20 |
| 2019-08-06 15:43:03.000 | 0 | 0 | NULL | NULL | NULL | NULL | NULL |
| 2019-08-06 15:42:04.000 | NULL | NULL | 29 | 20 | 150 | 20 | 20 |
| 2019-08-06 15:42:03.000 | 0 | 0 | NULL | NULL | NULL | NULL | NULL |
| 2019-08-06 15:41:04.000 | NULL | NULL | 29 | 20 | 150 | 20 | 20 |
| 2019-08-06 15:41:03.000 | 0 | 0 | NULL | NULL | NULL | NULL | NULL |
| 2019-08-06 15:40:04.000 | NULL | 0 | 29 | 20 | 150 | 20 | 20 |
| 2019-08-06 15:40:03.000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL |
| 2019-08-06 15:39:04.000 | NULL | 0 | 29 | 20 | 150 | 20 | 20 |
| 2019-08-06 15:39:03.000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL |
| 2019-08-06 15:38:04.000 | NULL | NULL | 29 | 20 | 150 | 20 | 20 |
| 2019-08-06 15:38:03.000 | 0 | 0 | NULL | NULL | NULL | NULL | NULL |
If you are using MSSQL, this following script will work. You can also use the same logic for other databases just applying appropriate DATETIME conversion for that database.
SELECT
FORMAT(CAST(DateandTime AS DATETIME), 'dd-MM.yyyy HH:mm'),
-- You can also use SUM/AVG for all columns if both row can
-- have data and based on your requirement.
MAX(t1) T1,
MAX(t2) T2,
MAX(t3) T3,
MAX(t4) T4,
MAX(t5) T5,
MAX(t6) T6,
MAX(t7) T7
FROM your_table
GROUP BY FORMAT(CAST(DateandTime AS DATETIME), 'dd-MM.yyyy HH:mm')
-- Do not required CAST(DateandTime AS DATETIME) if your
-- DateandTime column type is already DATETIME
I have written a query to convert rows into columns using multiple pivot functions with respect to months 4, 5 & 6. I did succeed in converting the rows into columns. Below is the query:
(SELECT [team],
Count_Of_OrderId,
Count_Of_OId,
Avg_a,
[Count_of_u] ,
convert(varchar(max),[month_from_Date])+'_COID' as
month_from_Date_COAID,
convert(varchar(max),[month_from_Date]) + '_COID' as
month_from_Date_CODID,
convert(varchar(max),[month_from_Date])+'_Avg_a' as
month_from_Date_Avg_a,
convert(varchar(max),[month_from_Date])+'_Count_of_u' as
month_from_Date_Count_of_u
FROM [MyTable]) AS S
PIVOT
(
MAX(Count_Of_OrderId,)
FOR [month_from_Date_COAID] IN ([4_COID], [5_COID], [6_COID])
) AS PivotTable1
PIVOT
(
MAX(Count_Of_OId)
FOR [month_from_Date_CODID] IN ([4_COID], [5_COID], [6_COID])
) AS PivotTable2
PIVOT
(
MAX(Avg_a)
FOR [month_from_Date_Avg_a] IN ([4_Avg_a], [5_Avg_a], [6_Avg_a])
) AS PivotTable3
PIVOT
(
MAX(Count_of_users)
FOR [month_from_Date_Count_of_u] IN ([4_Count_of_u], [5_Count_of_u],
[6_Count_of_u])
) AS PivotTable4
So the output was:
+--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--------------+--------------+--------------+
| Team | COAID_4 | COAID_5 | COAID_6 | CODID_4 | CODID_5 | CODID_6 | Avg_a_4 | Avg_a_5 | Avg_a_6 | Count_of_u_4 | Count_of_u_5 | Count_of_u_6 |
+--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--------------+--------------+--------------+
| Team A | NULL | NULL | 17 | NULL | NULL | 15 | NULL | NULL | 1.13 | NULL | NULL | 7 |
| Team A | NULL | 14 | NULL | NULL | 14 | NULL | NULL | 1 | NULL | NULL | 6 | NULL |
| Team A | 9 | NULL | NULL | 7 | NULL | NULL | 1.29 | NULL | NULL | 5 | NULL | NULL |
| Team B | NULL | NULL | 12159 | NULL | NULL | 6482 | NULL | NULL | 1.88 | NULL | NULL | 40 |
| Team B | NULL | 14287 | NULL | NULL | 6525 | NULL | NULL | 2.19 | NULL | NULL | 39 | NULL |
| Team B | 15822 | NULL | NULL | 7117 | NULL | NULL | 2.22 | NULL | NULL | 40 | NULL | NULL |
| Team C | NULL | NULL | 293 | NULL | NULL | 174 | NULL | NULL | 1.68 | NULL | NULL | 6 |
| Team C | NULL | 318 | NULL | NULL | 221 | NULL | NULL | 1.44 | NULL | NULL | 6 | NULL |
| Team C | 312 | NULL | NULL | 183 | NULL | NULL | 1.7 | NULL | NULL | 6 | NULL | NULL |
+--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--------------+--------------+--------------+
Here the the team has been split as 3 rows for 4th, 5th and 6th month. I would like to get the o/p as:
+--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--------------+--------------+--------------+
| Team | COAID_4 | COAID_5 | COAID_6 | CODID_4 | CODID_5 | CODID_6 | Avg_a_4 | Avg_a_5 | Avg_a_6 | Count_of_u_4 | Count_of_u_5 | Count_of_u_6 |
+--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--------------+--------------+--------------+
| Team A | 9 | 14 | 17 | 7 | 14 | 15 | 1.29 | 1 | 1.13 | 5 | 6 | 7 |
| Team B | 15822 | 14287 | 12159 | 7117 | 6525 | 6482 | 2.22 | 2.19 | 1.88 | 40 | 39 | 40 |
| Team C | 312 | 318 | 293 | 183 | 221 | 174 | 1.7 | 1.44 | 1.68 | 6 | 6 | 6 |
+--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--------------+--------------+--------------+
I am not sure, whats the mistake in my code.
A simple way you can use MAX
SELECT Team,
MAX(COAID_4),
MAX(COAID_5),
MAX(COAID_6),
....
FROM T
GROUP BY Team
T is your current query result SQL.
But I think you are looking for condition aggregate function to make the pivot.
SELECT
[team],
MAX(CASE WHEN month_from_Date = 4 THEN Count_Of_OrderId END) '4_COID',
MAX(CASE WHEN month_from_Date = 5 THEN Count_Of_OrderId END) '5_COID',
MAX(CASE WHEN month_from_Date = 6 THEN Count_Of_OrderId END) '6_COID',
MAX(CASE WHEN month_from_Date = 4 THEN Count_Of_OId END) '4_COID',
MAX(CASE WHEN month_from_Date = 5 THEN Count_Of_OId END) '5_COID',
MAX(CASE WHEN month_from_Date = 6 THEN Count_Of_OId END) '6_COID',
MAX(CASE WHEN month_from_Date = 4 THEN Avg_a END) '4_Avg_a',
MAX(CASE WHEN month_from_Date = 5 THEN Avg_a END) '5_Avg_a',
MAX(CASE WHEN month_from_Date = 6 THEN Avg_a END) '6_Avg_a',
MAX(CASE WHEN month_from_Date = 4 THEN Count_of_users END) '4_Count_of_u',
MAX(CASE WHEN month_from_Date = 5 THEN Count_of_users END) '5_Count_of_u',
MAX(CASE WHEN month_from_Date = 6 THEN Count_of_users END) '6_Count_of_u'
FROM [MyTable]
GROUP BY [team]
Scenario:
I am trying to build a query which has a start and end date, And the result of this query gives me the days in between and the day name. I then want to JOIN to another table which has expected pay dates and amounts. The JOINED table may have more days outside the range of the start and end date, which I want to exclude.
Progress:
I sort-of have what I want, but not in the correct output, I have created the following thus far:
DECLARE
#startDate DATETIME,
#endDate DATETIME
SET #startDate = CONVERT(VARCHAR(4), DATEPART(YEAR, DATEADD(MONTH, -1, GETDATE())))+'-'+CONVERT(VARCHAR(2), DATEPART(MONTH, DATEADD(MONTH, -1, GETDATE())))+'-21'
SET #endDate = CONVERT(VARCHAR(4), DATEPART(YEAR, DATEADD(MONTH, -1, GETDATE())))+'-'+CONVERT(VARCHAR(2), DATEPART(MONTH, DATEADD(MONTH, -0, GETDATE())))+'-20'
;WITH dates AS
(
SELECT #startdate as Date,DATENAME(Dw,#startdate) As DayName
UNION ALL
SELECT DATEADD(d,1,[Date]),DATENAME(Dw,DATEADD(d,1,[Date])) as DayName
FROM dates
WHERE DATE < #enddate
)
SELECT LEFT(CONVERT(VARCHAR(30),Date, 106), 2) + '-' + LEFT(CONVERT(VARCHAR(30),Date, 10), 2) Date,DayName, SUM(ExpectedAmount), ExpectedDate FROM dates
FULL JOIN Commissions.dbo.ThreeMonthPayment on CONVERT(VARCHAR(30),Date) = Commissions.dbo.ThreeMonthPayment.ExpectedDate
GROUP BY Date, DayName, ExpectedDate
Order by ExpectedDate
Which results in this table (Sorry so long):
+-------+-----------+------------------+--------------+
| Date | DayName | (No column name) | ExpectedDate |
+-------+-----------+------------------+--------------+
| NULL | NULL | 0 | NULL |
| 21-03 | Friday | NULL | NULL |
| 22-03 | Saturday | NULL | NULL |
| 23-03 | Sunday | NULL | NULL |
| 24-03 | Monday | NULL | NULL |
| 25-03 | Tuesday | NULL | NULL |
| 26-03 | Wednesday | NULL | NULL |
| 27-03 | Thursday | NULL | NULL |
| 28-03 | Friday | NULL | NULL |
| 29-03 | Saturday | NULL | NULL |
| 30-03 | Sunday | NULL | NULL |
| 31-03 | Monday | NULL | NULL |
| 01-04 | Tuesday | NULL | NULL |
| 02-04 | Wednesday | NULL | NULL |
| 03-04 | Thursday | NULL | NULL |
| 04-04 | Friday | NULL | NULL |
| 05-04 | Saturday | NULL | NULL |
| 06-04 | Sunday | NULL | NULL |
| 07-04 | Monday | NULL | NULL |
| 08-04 | Tuesday | NULL | NULL |
| 09-04 | Wednesday | NULL | NULL |
| 10-04 | Thursday | NULL | NULL |
| 11-04 | Friday | NULL | NULL |
| 12-04 | Saturday | NULL | NULL |
| 13-04 | Sunday | NULL | NULL |
| 14-04 | Monday | NULL | NULL |
| 15-04 | Tuesday | NULL | NULL |
| 16-04 | Wednesday | NULL | NULL |
| 17-04 | Thursday | NULL | NULL |
| 18-04 | Friday | NULL | NULL |
| 19-04 | Saturday | NULL | NULL |
| 20-04 | Sunday | NULL | NULL |
| NULL | NULL | 89466 | 01-03 |
| NULL | NULL | 86058 | 01-04 |
| NULL | NULL | 23356 | 01-05 |
| NULL | NULL | 1858 | 01-06 |
| NULL | NULL | 13597 | 02-03 |
| NULL | NULL | 55587 | 02-04 |
| NULL | NULL | 7857 | 02-05 |
| NULL | NULL | 1377 | 02-06 |
| NULL | NULL | 6947 | 03-03 |
| NULL | NULL | 49626 | 03-04 |
| NULL | NULL | 0 | 03-05 |
| NULL | NULL | 0 | 03-06 |
| NULL | NULL | 6054 | 04-03 |
| NULL | NULL | 31639 | 04-04 |
| NULL | NULL | 0 | 04-05 |
| NULL | NULL | 0 | 04-06 |
| NULL | NULL | 26421 | 05-03 |
| NULL | NULL | 28154 | 05-04 |
| NULL | NULL | 15036 | 05-05 |
| NULL | NULL | 634 | 05-06 |
| NULL | NULL | 0 | 05-07 |
| NULL | NULL | 20832 | 06-03 |
| NULL | NULL | 0 | 06-04 |
| NULL | NULL | 0 | 06-05 |
| NULL | NULL | 0 | 06-06 |
| NULL | NULL | 5406 | 07-03 |
| NULL | NULL | 12864 | 07-04 |
| NULL | NULL | 4257 | 07-05 |
| NULL | NULL | 537 | 07-06 |
| NULL | NULL | 0 | 08-03 |
| NULL | NULL | 363 | 08-04 |
| NULL | NULL | 426 | 08-05 |
| NULL | NULL | 0 | 08-06 |
| NULL | NULL | 0 | 09-03 |
| NULL | NULL | 23240 | 09-04 |
| NULL | NULL | 0 | 09-05 |
| NULL | NULL | 0 | 09-06 |
| NULL | NULL | 12670 | 10-03 |
| NULL | NULL | 6790 | 10-04 |
| NULL | NULL | 0 | 10-05 |
| NULL | NULL | 0 | 10-06 |
| NULL | NULL | 2914 | 11-03 |
| NULL | NULL | 19053 | 11-04 |
| NULL | NULL | 0 | 11-05 |
| NULL | NULL | 0 | 11-06 |
| NULL | NULL | 6402 | 12-03 |
| NULL | NULL | 0 | 12-04 |
| NULL | NULL | 0 | 12-05 |
| NULL | NULL | 0 | 12-06 |
| NULL | NULL | 4166 | 13-03 |
| NULL | NULL | 0 | 13-04 |
| NULL | NULL | 0 | 13-05 |
| NULL | NULL | 0 | 13-06 |
| NULL | NULL | 50534 | 14-03 |
| NULL | NULL | 23854 | 14-04 |
| NULL | NULL | 15435 | 14-05 |
| NULL | NULL | 4003 | 14-06 |
| NULL | NULL | 475330 | 15-03 |
| NULL | NULL | 451014 | 15-04 |
| NULL | NULL | 103210 | 15-05 |
| NULL | NULL | 19947 | 15-06 |
| NULL | NULL | 12084 | 16-03 |
| NULL | NULL | 22203 | 16-04 |
| NULL | NULL | 517 | 16-05 |
| NULL | NULL | 0 | 16-06 |
| NULL | NULL | 31423 | 17-03 |
| NULL | NULL | 32150 | 17-04 |
| NULL | NULL | 0 | 17-05 |
| NULL | NULL | 0 | 17-06 |
| NULL | NULL | 33402 | 18-03 |
| NULL | NULL | 900 | 18-04 |
| NULL | NULL | 289 | 18-05 |
| NULL | NULL | 0 | 18-06 |
| NULL | NULL | 33929 | 19-03 |
| NULL | NULL | 6942 | 19-04 |
| NULL | NULL | 0 | 19-05 |
| NULL | NULL | 0 | 19-06 |
| NULL | NULL | 161806 | 20-03 |
| NULL | NULL | 141319 | 20-04 |
| NULL | NULL | 26659 | 20-05 |
| NULL | NULL | 4695 | 20-06 |
| NULL | NULL | 21074 | 21-03 |
| NULL | NULL | 15579 | 21-04 |
| NULL | NULL | 2693 | 21-05 |
| NULL | NULL | 0 | 21-06 |
| NULL | NULL | 28401 | 22-03 |
| NULL | NULL | 46258 | 22-04 |
| NULL | NULL | 11409 | 22-05 |
| NULL | NULL | 1672 | 22-06 |
| NULL | NULL | 76562 | 23-03 |
| NULL | NULL | 66804 | 23-04 |
| NULL | NULL | 32853 | 23-05 |
| NULL | NULL | 3168 | 23-06 |
| NULL | NULL | 47008 | 24-03 |
| NULL | NULL | 35888 | 24-04 |
| NULL | NULL | 4528 | 24-05 |
| NULL | NULL | 459 | 24-06 |
| NULL | NULL | 1108747 | 25-03 |
| NULL | NULL | 543351 | 25-04 |
| NULL | NULL | 152852 | 25-05 |
| NULL | NULL | 15712 | 25-06 |
| NULL | NULL | 343379 | 26-03 |
| NULL | NULL | 117657 | 26-04 |
| NULL | NULL | 41793 | 26-05 |
| NULL | NULL | 5645 | 26-06 |
| NULL | NULL | 0 | 27-02 |
| NULL | NULL | 401110 | 27-03 |
| NULL | NULL | 87571 | 27-04 |
| NULL | NULL | 39192 | 27-05 |
| NULL | NULL | 2801 | 27-06 |
| NULL | NULL | 313274 | 28-03 |
| NULL | NULL | 92607 | 28-04 |
| NULL | NULL | 21901 | 28-05 |
| NULL | NULL | 1852 | 28-06 |
| NULL | NULL | 77999 | 29-03 |
| NULL | NULL | 27693 | 29-04 |
| NULL | NULL | 3341 | 29-05 |
| NULL | NULL | 0 | 29-06 |
| NULL | NULL | 229556 | 30-03 |
| NULL | NULL | 261036 | 30-04 |
| NULL | NULL | 9109 | 30-05 |
| NULL | NULL | 545 | 30-06 |
| NULL | NULL | 460871 | 31-03 |
| NULL | NULL | 28710 | 31-05 |
+-------+-----------+------------------+--------------+
Out of the above results, I am trying to match the ExpectedDate to the date column , so instead of seeing the above results, I would have something that looks like this: (to keep it short, i haven't created all of the days I receive from my start and end date)
+-------+-----------+------------------+--------------+
| Date | DayName | (No column name) | ExpectedDate |
+-------+-----------+------------------+--------------+
| NULL | NULL | 0 | NULL |
| 21-03 | Friday | 21074 | 21-03 |
| 22-03 | Saturday | 28401 | 22-03 |
| 23-03 | Sunday | 76562 | 23-03 |
| 24-03 | Monday | 47008 | 24-03 |
+-------+-----------+------------------+--------------+
But you can see above that the expectedDate and date column are grouped / joined nicely together. And the expectedDates that are not in the date range are not displayed.
I have been struggling with this the entire morning :( is this possible ??
Any help or links to threads that I may have missed would be great!
I am using SQL SERVER 2008
Thanks so much.
first of all the full join includes everything. from both tables. if you only want the dates from the dates cte, use left join.
secondly, the CONVERT(VARCHAR(30),Date) = Commissions.dbo.ThreeMonthPayment.ExpectedDate seems to not work. are you sure you need to convert?
i suggest you try this:
DECLARE
#startDate DATETIME,
#endDate DATETIME
SET #startDate = CONVERT(VARCHAR(4), DATEPART(YEAR, DATEADD(MONTH, -1, GETDATE())))+'-'+CONVERT(VARCHAR(2), DATEPART(MONTH, DATEADD(MONTH, -1, GETDATE())))+'-21'
SET #endDate = CONVERT(VARCHAR(4), DATEPART(YEAR, DATEADD(MONTH, -1, GETDATE())))+'-'+CONVERT(VARCHAR(2), DATEPART(MONTH, DATEADD(MONTH, -0, GETDATE())))+'-20'
;WITH dates AS
(
SELECT #startdate as Date,DATENAME(Dw,#startdate) As DayName
UNION ALL
SELECT DATEADD(d,1,[Date]),DATENAME(Dw,DATEADD(d,1,[Date])) as DayName
FROM dates
WHERE DATE < #enddate
)
SELECT LEFT(CONVERT(VARCHAR(30),Date, 106), 2) + '-' + LEFT(CONVERT(VARCHAR(30),Date, 10), 2) Date
, DayName, SUM(ExpectedAmount), ExpectedDate
FROM dates
LEFT JOIN Commissions.dbo.ThreeMonthPayment
on Date = Commissions.dbo.ThreeMonthPayment.ExpectedDate
GROUP BY
Date
, DayName
, ExpectedDate
Order by
ExpectedDate