Grouping / Ordering confusion - sql

Hopefully what I have here is a simple question and explained to you in the correct manner.
I have the following Query:
--DECLARE DATES
DECLARE #Date datetime
DECLARE #DaysInMonth INT
DECLARE #i INT
--GIVE VALUES
SET #Date = Getdate()
SELECT #DaysInMonth = datepart(dd,dateadd(dd,-1,dateadd(mm,1,cast(cast(year(#Date) as varchar)+'-'+cast(month(#Date) as varchar)+'-01' as datetime))))
SET #i = 1
--MAKE TEMP TABLE
CREATE TABLE #TempDays
(
[days] VARCHAR(50)
)
WHILE #i <= #DaysInMonth
BEGIN
INSERT INTO #TempDays
VALUES(#i)
SET #i = #i + 1
END
SELECT #TempDays.days, DATEPART(dd, a.ActualDate) ActualDate, a.ActualAmount, (SELECT SUM(b.ActualAmount)
FROM UnpaidManagement..Actual b
WHERE b.ID <= a.ID) RunningTotal
FROM UnpaidManagement..Actual a
RIGHT JOIN #TempDays on a.ID = #TempDays.days
DROP TABLE #TempDays
Which produces the following output:
+------+------------+--------------+--------------+
| days | ActualDate | ActualAmount | RunningTotal |
+------+------------+--------------+--------------+
| 1 | 1 | 438706 | R 438 706 |
| 2 | 2 | 16239 | R 454 945 |
| 3 | 3 | 1611264 | R 2 066 209 |
| 4 | 4 | 1157777 | R 3 223 986 |
| 5 | 5 | 470662 | R 3 694 648 |
| 6 | 6 | 288628 | 3983276 |
| 7 | 7 | 245897 | 4229173 |
| 8 | 8 | 5235 | 4234408 |
| 9 | 10 | 375630 | 4610038 |
| 10 | 11 | 95610 | 4705648 |
| 11 | 12 | 87285 | 4792933 |
| 12 | 13 | 73399 | 4866332 |
| 13 | 14 | 59516 | 4925848 |
| 14 | 15 | 918915 | 5844763 |
| 15 | 17 | 1957285 | 7802048 |
| 16 | 18 | 489964 | 8292012 |
| 17 | 19 | 272304 | 8564316 |
| 18 | 20 | 378601 | 8942917 |
| 19 | 22 | 92374 | 9035291 |
| 20 | 23 | 198 | 9035489 |
| 21 | 24 | 1500820 | 10536309 |
| 22 | 25 | 2631057 | 13167366 |
| 23 | 26 | 6466505 | 19633871 |
| 24 | 27 | 3757350 | 23391221 |
| 25 | 28 | 3487466 | 26878687 |
| 26 | 29 | 160197 | 27038884 |
| 27 | 30 | 14000 | 27052884 |
| 28 | NULL | NULL | NULL |
| 29 | NULL | NULL | NULL |
| 30 | NULL | NULL | NULL |
| 31 | NULL | NULL | NULL |
+------+------------+--------------+--------------+
If you look closely at the table above, the "ActualDate" column is missing a few values, EG: 9, 16, etc.
And because of this, the rows are being pushed up instead of being grouped with their correct number? How would I accomplish a group by / anything to keep them in their correct row?
DESIRED OUTPUT:
+------+------------+--------------+--------------+
| days | ActualDate | ActualAmount | RunningTotal |
+------+------------+--------------+--------------+
| 1 | 1 | 438706 | R 438 706 |
| 2 | 2 | 16239 | R 454 945 |
| 3 | 3 | 1611264 | R 2 066 209 |
| 4 | 4 | 1157777 | R 3 223 986 |
| 5 | 5 | 470662 | R 3 694 648 |
| 6 | 6 | 288628 | 3983276 |
| 7 | 7 | 245897 | 4229173 |
| 8 | 8 | 5235 | 4234408 |
| 9 | NULL | NULL | NULL |
| 10 | 10 | 375630 | 4610038 |
| 11 | 11 | 95610 | 4705648 |
| 12 | 12 | 87285 | 4792933 |
| 13 | 13 | 73399 | 4866332 |
| 14 | 14 | 59516 | 4925848 |
| 15 | 15 | 918915 | 5844763 |
| 16 | NULL | NULL | NULL |
| 17 | 17 | 1957285 | 7802048 |
| 18 | 18 | 489964 | 8292012 |
| 19 | 19 | 272304 | 8564316 |
| 20 | 20 | 378601 | 8942917 |
| 21 | NULL | NULL | NULL |
| 22 | 22 | 92374 | 9035291 |
| 23 | 23 | 198 | 9035489 |
| 24 | 24 | 1500820 | 10536309 |
| 25 | 25 | 2631057 | 13167366 |
| 26 | 26 | 6466505 | 19633871 |
| 27 | 27 | 3757350 | 23391221 |
| 28 | 28 | 3487466 | 26878687 |
| 29 | 29 | 160197 | 27038884 |
| 30 | 30 | 14000 | 27052884 |
| 31 | NULL | NULL | NULL |
+------+------------+--------------+--------------+
I know this is a long one to read, but please let me know if I have explained this clearly enough. I have been trying to group by this whole morning, but I keep getting errors.

SELECT #TempDays.days, DATEPART(dd, a.ActualDate) ActualDate, a.ActualAmount, (SELECT SUM(b.ActualAmount)
FROM UnpaidManagement..Actual b
WHERE b.ID <= a.ID) RunningTotal
FROM UnpaidManagement..Actual a
RIGHT JOIN #TempDays on DATEPART(dd, a.ActualDate) = #TempDays.days

If you select the temp table as first table in the select and join to UnpaidManagement..Actual you have the days in correct row and order:
SELECT t.days
,DATEPART(dd, a.ActualDate) ActualDate
,a.ActualAmount
,(
SELECT SUM(b.ActualAmount)
FROM UnpaidManagement..Actual b
WHERE b.ID <= a.ID
) RunningTotal
FROM #TempDays AS t
INNER JOIN UnpaidManagement..Actual AS a ON a.IDENTITYCOL = t.days
ORDER BY t.days
After doing that, cou can add CASE WHEN to generate content for the NULL cells.

Related

How to select data order by sort value and sort NULL By name

I wrote following SQL query to select data from #tmp table variable.
SELECT #rowCount AS [row-count],
t.[row-no] AS [row-no],
t.[ServiceID] AS ServiceID,
t.ServiceName AS ServiceName,
t.[BranchServiceSortValue] AS SortValue,
(CASE WHEN t.OptIn = 1 THEN 'Yes' ELSE 'No' END) AS OptIn
FROM #tmp t
INNER JOIN dbo.Category
ON Category.CategoryId = t.FkCategoryId
INNER JOIN dbo.ServiceType
ON ServiceType.ServiceTypeId = t.FkServiceTypeId
WHERE t.[row-no] >= #startRow
AND t.[row-no] <= #endRow
ORDER BY t.BranchServiceSortValue,t.serviceName
According to the data in #tmp table,my above query return following output.
| row-count | row-no | ServiceID | ServiceName | SortValue | OptIn |
|-----------|--------|-----------|-------------|-----------|-------|
| 24 | 4 | 1088 | AAB | NULL | No |
| 24 | 5 | 1089 | AAC | NULL | No |
| 24 | 6 | 1090 | AAD | NULL | No |
| 24 | 1 | 1093 | GDGD | 0 | Yes |
| 24 | 7 | 1091 | EETETE | 1 | Yes |
| 24 | 8 | 1092 | CSCDF | 2 | Yes |
| 24 | 3 | 1086 | CXCX | 3 | Yes |
| 24 | 9 | 16 | ASA | 4 | Yes |
| 24 | 2 | 1087 | BFB | 5 | Yes |
| 24 | 10 | 7 | Mortgage | 6 | Yes |
| 24 | 11 | 17 | DDWW | 7 | Yes |
| 24 | 12 | 11 | IL | 8 | Yes |
| 24 | 13 | 5 | SAA | 9 | Yes |
| 24 | 14 | 9 | CD | 10 | Yes |
You can see according to my above query data rows are sorted by SortValue and when SortValue = NULL, those 3 rows sorted by its ServiceName,
But I need to displaySortValue = NULLrows at the bottom of the other rows.Its mean I need to display Null rows after the SortValue Not NULL data and SortValue = NULL should be display order by its ServiceName.
My Expected Output is:
| row-count | row-no | ServiceID | ServiceName | SortValue | OptIn |
|-----------|--------|-----------|-------------|-----------|-------|
| 14 | 1 | 1093 | GDGD | 0 | Yes |
| 14 | 7 | 1091 | EETETE | 1 | Yes |
| 14 | 8 | 1092 | CSCDF | 2 | Yes |
| 14 | 3 | 1086 | CXCX | 3 | Yes |
| 14 | 9 | 16 | ASA | 4 | Yes |
| 14 | 2 | 1087 | BFB | 5 | Yes |
| 14 | 10 | 7 | Mortgage | 6 | Yes |
| 14 | 11 | 17 | DDWW | 7 | Yes |
| 14 | 12 | 11 | IL | 8 | Yes |
| 14 | 13 | 5 | SAA | 9 | Yes |
| 14 | 14 | 9 | CD | 10 | Yes |
| 14 | 4 | 1088 | AAB | NULL | No |
| 14 | 5 | 1089 | AAC | NULL | No |
| 14 | 6 | 1090 | AAD | NULL | No |
How should I need to change my query to get above output? please help me
NULL has the lowest value, so you'll need to use a CASE to put NULL at the end, and then sort by SortValue:
ORDER BY CASE WHEN t.BranchServiceSortValue IS NULL THEN 1 ELSE 0 END,
t.BranchServiceSortValue,
t.serviceName;
Just add a key to the ORDER BY:
ORDER BY (CASE WHEN t.BranchServiceSortValue IS NOT NULL THEN 1 ELSE 2 END),
t.BranchServiceSortValue, t.serviceName
The SQL standard provides the options NULLS FIRST and NULLS LAST for ORDER BY clauses. SQL Server does not (yet) implement these.

Incremental/Update in hive

I have a hive external table with data say, (version less than 0.14)
+--------+------+------+------+
| id | A | B | C |
+--------+------+------+------+
| 10011 | 10 | 3 | 0 |
| 10012 | 9 | 0 | 40 |
| 10015 | 10 | 3 | 0 |
| 10017 | 9 | 0 | 40 |
+--------+------+------+------+
And I have a delta file having data given below.
+--------+------+------+------+
| id | A | B | C |
+--------+------+------+------+
| 10012 | 50 | 3 | 10 | --> update
| 10013 | 29 | 0 | 40 | --> insert
| 10014 | 10 | 3 | 0 | --> update
| 10013 | 19 | 0 | 40 | --> update
| 10015 | 70 | 3 | 0 | --> update
| 10016 | 17 | 0 | 40 | --> insert
+--------+------+------+------+
How can I update my hive table with the delta file, without using sqoop. Any help on how to proceed will be great! Thanks.
This is because there is duplicates in the file. How do you know which you should keep? The last one?
In that case you can use, for example, the row_number and then get the maximum value. Something like that.
SELECT coalesce(tmp.id,initial.id) as id,
coalesce(tmp.A, initial.A) as A,
coalesce(tmp.B,initial.B) as B,
coalesce(tmp.C, initial.C) as C
FROM
table_a initial
FULL OUTER JOIN
( SELECT *, row_number() over( partition by id ) as row_num
,COUNT(*) OVER (PARTITION BY id) AS cnt
FROM temp_table
) tmp
ON initial.id=tmp.id
WHERE row_num=cnt
OR row_num IS NULL;
Output:
+--------+-----+----+-----+--+
| id | a | b | c |
+--------+-----+----+-----+--+
| 10011 | 10 | 3 | 0 |
| 10012 | 50 | 3 | 10 |
| 10013 | 19 | 0 | 40 |
| 10014 | 10 | 3 | 0 |
| 10015 | 70 | 3 | 0 |
| 10016 | 17 | 0 | 40 |
| 10017 | 9 | 0 | 40 |
+--------+-----+----+-----+--+
You can load the file to a temporary table in hive and then execute a FULL OUTER JOIN between the two tables.
Query Example:
SELECT coalesce(tmp.id,initial.id) as id,
coalesce(tmp.A, initial.A) as A,
coalesce(tmp.B,initial.B) as B,
coalesce(tmp.C, initial.C) as C
FROM
table_a initial
FULL OUTER JOIN
temp_table tmp on initial.id=tmp.id;
Output
+--------+-----+----+-----+--+
| id | a | b | c |
+--------+-----+----+-----+--+
| 10011 | 10 | 3 | 0 |
| 10012 | 50 | 3 | 10 |
| 10013 | 29 | 0 | 40 |
| 10013 | 19 | 0 | 40 |
| 10014 | 10 | 3 | 0 |
| 10015 | 70 | 3 | 0 |
| 10016 | 17 | 0 | 40 |
| 10017 | 9 | 0 | 40 |
+--------+-----+----+-----+--+

how to update from the same table

I have this table:
I need to update the "ivalue" where mfieldid =33
based on the ivalue where mfieldid =44 for each respid
exmple:
for respid = 1 should updated 'gfgd' instead '12345'
+--------+----------+------------+
| respid | mfieldid | ivalue |
+--------+----------+------------+
| 1 | 33 | 12345 |
| 1 | 44 | gfgd |
| 2 | 33 | 54353453 |
| 2 | 44 | treterttre |
| 3 | 33 | 5454 |
| 3 | 44 | tgbg |
| 4 | 33 | 5443333 |
| 4 | 44 | bcvbcv |
+--------+----------+------------+
UPDATE t33
SET iValue=t44.iValue
--SELECT *
FROM Table t33
INNER JOIN Table t44 ON t33.respid = t44.respid
AND t33.mfieldid=33
AND t44.mfieldid=44

Segregate rows according to their HEAD (parent) - sql

I have the following SQL table.
+----+--------+----------+--------+
| ID | TestNo | TestName | HeadID |
+----+--------+----------+--------+
| 1 | 21 | Comp-1 | null |
| 2 | 22 | C1 | 21 |
| 3 | 23 | C2 | 21 |
| 4 | 24 | C3 | 21 |
| 5 | 47 | Comp-2 | null |
| 6 | 25 | C4 | 47 |
| 7 | 26 | C1+ | 21 |
+----+--------+----------+--------+
I want to get all the child rows (according to their HeadID) below their head test.
select * from ranges order by HeadID
The ACTUAL OUPUT I get from the above query:
+----+--------+----------+--------+
| ID | TestNo | TestName | HeadID |
+----+--------+----------+--------+
| 1 | 21 | Comp-1 | null |
| 5 | 47 | Comp-2 | null |
| 2 | 22 | C1 | 21 |
| 3 | 23 | C2 | 21 |
| 4 | 24 | C3 | 21 |
| 7 | 26 | C1+ | 21 |
| 6 | 25 | C4 | 47 |
+----+--------+----------+--------+
but my DESIRED OUTPUT is:
+----+--------+----------+--------+
| ID | TestNo | TestName | HeadID |
+----+--------+----------+--------+
| 1 | 21 | Comp-1 | null |
| 2 | 22 | C1 | 21 |
| 3 | 23 | C2 | 21 |
| 4 | 24 | C3 | 21 |
| 7 | 26 | C1+ | 21 |
| 5 | 47 | Comp-2 | null |
| 6 | 25 | C4 | 47 |
+----+--------+----------+--------+
How can I achieve this?
If you have only one level of children, then you can achieve this ordering like this:
SELECT *
FROM Ranges
ORDER BY
CASE WHEN HeadID IS NULL THEN TestNo ELSE HeadID END
,HeadID
,ID
;

Running total SQL server - AGAIN

i'm aware that this question has been asked multiple times and I have read those threads to get to where I am now, but those solutions don't seem to be working. I need to have a running total of my ExpectedAmount...
I have the following table:
+--------------+----------------+
| ExpectedDate | ExpectedAmount |
+--------------+----------------+
| 1 | 2485513 |
| 2 | 526032 |
| 3 | 342041 |
| 4 | 195807 |
| 5 | 380477 |
| 6 | 102233 |
| 7 | 539951 |
| 8 | 107145 |
| 10 | 165110 |
| 11 | 18795 |
| 12 | 27177 |
| 13 | 28232 |
| 14 | 154631 |
| 15 | 5566585 |
| 16 | 250814 |
| 17 | 90444 |
| 18 | 105424 |
| 19 | 62132 |
| 20 | 1799349 |
| 21 | 303131 |
| 22 | 459464 |
| 23 | 723488 |
| 24 | 676514 |
| 25 | 17311911 |
| 26 | 4876062 |
| 27 | 4844434 |
| 28 | 4039687 |
| 29 | 1418648 |
| 30 | 4366189 |
| 31 | 9028836 |
+--------------+----------------+
I have the following SQL:
SELECT a.ExpectedDate, a.ExpectedAmount, (SELECT SUM(b.ExpectedAmount)
FROM UnpaidManagement..Expected b
WHERE b.ExpectedDate <= a.ExpectedDate)
FROM UnpaidManagement..Expected a
The result of the above SQL is this:
+--------------+----------------+--------------+
| ExpectedDate | ExpectedAmount | RunningTotal |
+--------------+----------------+--------------+
| 1 | 2485513 | 2485513 |
| 2 | 526032 | 9480889 |
| 3 | 342041 | 46275618 |
| 4 | 195807 | 59866450 |
| 5 | 380477 | 60246927 |
| 6 | 102233 | 60349160 |
| 7 | 539951 | 60889111 |
| 8 | 107145 | 60996256 |
| 10 | 165110 | 2650623 |
| 11 | 18795 | 2669418 |
| 12 | 27177 | 2696595 |
| 13 | 28232 | 2724827 |
| 14 | 154631 | 2879458 |
| 15 | 5566585 | 8446043 |
| 16 | 250814 | 8696857 |
| 17 | 90444 | 8787301 |
| 18 | 105424 | 8892725 |
| 19 | 62132 | 8954857 |
| 20 | 1799349 | 11280238 |
| 21 | 303131 | 11583369 |
| 22 | 459464 | 12042833 |
| 23 | 723488 | 12766321 |
| 24 | 676514 | 13442835 |
| 25 | 17311911 | 30754746 |
| 26 | 4876062 | 35630808 |
| 27 | 4844434 | 40475242 |
| 28 | 4039687 | 44514929 |
| 29 | 1418648 | 45933577 |
| 30 | 4366189 | 50641807 |
| 31 | 9028836 | 59670643 |
+--------------+----------------+--------------+
You can tell from the first few values already that the math is all off, but then at some points the math adds up?! I'm Too confused !! Could someone please point me to another solution or to where I have gone wrong with this?
I am using SQL Server 2008.
It is because your ExpectedDate column of type varchar. Try this:
SELECT a.ExpectedDate, a.ExpectedAmount, (SELECT SUM(b.ExpectedAmount)
FROM UnpaidManagement..Expected b
WHERE CAST(b.ExpectedDate as int) <= CAST(a.ExpectedDate as int))
FROM UnpaidManagement..Expected a
Note that it could be inefficient query.
I think this gonna work:
SELECT a.ExpectedDate,
a.ExpectedAmount,
SUM(b.ExpectedAmount) RuningTotal
FROM UnpaidManagement..Expected a
LEFT JOIN UnpaidManagement..Expected b
ON CAST(b.ExpectedDate as int) <= CAST(a.ExpectedDate as int)
GROUP BY a.ExpectedDate, a.ExpectedAmount
AND:
SELECT a.ExpectedDate,
a.ExpectedAmount,
c.RuningTotal
FROM UnpaidManagement..Expected a
CROSS APPLY (
SELECT SUM(b.ExpectedAmount) AS RuningTotal
FROM UnpaidManagement..Expected b
WHERE CAST(b.ExpectedDate as int) <= CAST(a.ExpectedDate as int)
) c