MySQL Sum one field based on a second 'common' field - sql

Not a SQL guy, so i'm in a bind, so I'll keep this simple
+--------------+------------------------+
| RepaymentDay | MonthlyRepaymentAmount |
+--------------+------------------------+
| 3 | 0.214847 |
| 26 | 0.219357 |
| 24 | 0.224337 |
| 5 | 0.224337 |
| 18 | 0.224337 |
| 28 | 0.214847 |
| 1 | 0.224337 |
| 28 | 0.327079 |
+--------------+------------------------+
I am looking for a query that would then give something like;
+--------------+------------------------+
| RepaymentDay | MonthlyRepaymentAmount |
+--------------+------------------------+
| 1 | 0.224337 |
| 3 | 0.214847 |
| 5 | 0.224337 |
| 18 | 0.224337 |
| 24 | 0.224337 |
| 26 | 0.219357 |
| 28 | 0.541926 |
+--------------+------------------------+
I.e Where there are multiple records with the same value of 'RepaymentDay' , to sum those values of 'MonthlyRepaymentAmount'.
I could do it externally with perl or python, no issue, but I want to try and become more familiar with SQL.
Any ideas?

SELECT RepaymentDay, SUM(MonthlyRepaymentAmount) AS MonthlyRepaymentAmount
FROM YourTable
GROUP BY RepaymentDay
ORDER BY RepaymentDay

select RepaymentDay, sum(MonthlyRepaymentAmount) from table_name group by RepaymentDay;

Related

SQL Server : sort ranking list

I have a table with 3 columns ordernum, username, and amount. I want to select its rows and show an additional column expected.
The rule for calculating the expected column is as follows:
These rows same OrderNum value will be ranking again based on amount column (Desc order). I don't know how I can describe, but the expected result is shown below :(
I tried with RANK() and ROW_NUMBER(), but have not been able to properly apply above algorithm.
This is my table declaration:
CREATE TABLE data
(
ordernum INT,
username NVARCHAR(30),
amount MONEY
);
This is my table content:
+----------+----------+------------+
| ORDERNUM | USERNAME | AMOUNT |
+----------+----------+------------+
| 1 | test01 | 18382.5079 |
| 1 | test02 | 10476.0000 |
| 1 | test03 | 8128.0000 |
| 1 | test04 | 6680.0000 |
| 1 | test05 | 5388.9673 |
| 1 | test06 | 5356.0000 |
| 12 | test07 | 2806.0000 |
| 12 | test08 | 2806.0000 |
| 12 | test09 | 2806.0000 |
| 14 | test10 | 2530.0000 |
| 15 | test11 | 2330.0000 |
| 16 | test12 | 2183.0000 |
| 16 | test13 | 2182.0000 |
| 17 | test14 | 2000.0000 |
| 18 | test15 | 1621.0000 |
+----------+----------+------------+
And this is my expected result:
+----------+----------+------------+----------+
| ORDERNUM | USERNAME | AMOUNT | EXPECTED |
+----------+----------+------------+----------+
| 1 | test01 | 18382.5079 | 1 |
| 1 | test02 | 10476.0000 | 2 |
| 1 | test03 | 8128.0000 | 3 |
| 1 | test04 | 6680.0000 | 4 |
| 1 | test05 | 5388.9673 | 5 |
| 1 | test06 | 5356.0000 | 6 |
| 12 | test07 | 2806.0000 | 12 |
| 12 | test08 | 2806.0000 | 12 |
| 12 | test09 | 2806.0000 | 12 |
| 14 | test10 | 2530.0000 | 15 |
| 15 | test11 | 2330.0000 | 16 |
| 16 | test12 | 2183.0000 | 17 |
| 16 | test13 | 2182.0000 | 18 |
| 17 | test14 | 2000.0000 | 19 |
| 18 | test15 | 1621.0000 | 20 |
+----------+----------+------------+----------+
Here is a fiddle for the problem: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=4014cb469a9ec8f57ded5a5e0e60adaf
This should work.
select
OrderNum,
Username,
Amount,
RANK() over (order by OrderNum) as Expected
from yourTable

PostgreSQL: Count number of rows in table 1 for distinct rows in table 2

I am working with really big data that at the moment I become confused, looking like I'm just repeating one thing.
I want to count the number of trips per user from two tables, trips and session.
psql=> SELECT * FROM trips limit 10;
trip_id | session_ids | daily_user_id | seconds_start | seconds_end
---------+-----------------+---------------+---------------+-------------
400543 | {172079} | 17118 | 1575550944 | 1575551181
400542 | {172078} | 17118 | 1575541533 | 1575542171
400540 | {172077} | 17118 | 1575539001 | 1575539340
400538 | {172076} | 17117 | 1575540499 | 1575541999
400534 | {172074,172075} | 17117 | 1575537161 | 1575539711
400530 | {172073} | 17116 | 1575447043 | 1575447682
400529 | {172071} | 17115 | 1575496394 | 1575497803
400527 | {172070} | 17113 | 1575495241 | 1575496034
400525 | {172068} | 17115 | 1575485658 | 1575489378
400524 | {172067} | 17113 | 1575488721 | 1575490491
(10 rows)
psql=> SELECT * FROM session limit 10;
session_id | user_id | key | start_time | daily_user_id
------------+---------+--------------------------+------------+---------------
172079 | 43 | hLB8S7aSfp4gAFp7TykwYQ==+| 1575550921 | 17118
| | | |
172078 | 43 | YATMrL/AQ7Nu5q2dQTMT1A==+| 1575541530 | 17118
| | | |
172077 | 43 | fOLX4tqvsyFOP3DCyBZf1A==+| 1575538997 | 17118
| | | |
172076 | 7 | 88hwGj4Mqa58juy0PG/R4A==+| 1575540515 | 17117
| | | |
172075 | 7 | 1O+8X49+YbtmoEa9BlY5OQ==+| 1575538384 | 17117
| | | |
172074 | 7 | XOR7hsFCNk+soM75ZhDJyA==+| 1575537405 | 17117
| | | |
172073 | 42 | rAQWwYgqg3UMTpsBYSpIpA==+| 1575447109 | 17116
| | | |
172072 | 276 | 0xOsxRRN3Sq20VsXWjlrzQ==+| 1575511120 | 17114
| | | |
172071 | 7 | P4beN3W/ZrD+TCpZGYh23g==+| 1575496642 | 17115
| | | |
172070 | 43 | OFi30Zv9e5gmLZS5Vb+I7Q==+| 1575495238 | 17113
| | | |
(10 rows)
Goal: get the distribution of trips per user
Attempt:
psql=> SELECT COUNT(distinct trip_id) as trips
, count(distinct user_id) as users
, extract(year from to_timestamp(seconds_start)) as year_date
, extract(month from to_timestamp(seconds_start)) as month_date
FROM trips
INNER JOIN session
ON session_id = ANY(session_ids)
GROUP BY year_date, month_date
ORDER BY year_date, month_date;
+-------+-------+-----------+------------+
| trips | users | year_date | month_date |
+-------+-------+-----------+------------+
| 371 | 44 | 2016 | 3 |
| 12207 | 185 | 2016 | 4 |
| 3859 | 88 | 2016 | 5 |
| 1547 | 28 | 2016 | 6 |
| 831 | 17 | 2016 | 7 |
| 427 | 4 | 2016 | 8 |
| 512 | 13 | 2016 | 9 |
| 431 | 11 | 2016 | 10 |
| 1011 | 26 | 2016 | 11 |
| 791 | 15 | 2016 | 12 |
| 217 | 8 | 2017 | 1 |
| 490 | 17 | 2017 | 2 |
| 851 | 18 | 2017 | 3 |
| 1890 | 66 | 2017 | 4 |
| 2143 | 43 | 2017 | 5 |
| . | | | |
| . | | | |
| . | | | |
+-------+-------+-----------+------------+
This resultset count number of users and trips, my intention is actually to get an analysis of trips per user, like so:
+------+-------------+
| user | no_of_trips |
+------+-------------+
| 1 | 489 |
| 2 | 400 |
| 3 | 12 |
| 4 | 102 |
| . | |
| . | |
| . | |
+------+-------------+
How do I do this, please?
You seem to just want aggregation by user_id:
SELECT s.user_id, COUNT(distinct t.trip_id) as trips
FROM trips t INNER JOIN
session s
ON s.session_id = ANY(t.session_ids)
GROUP BY s.user_id ;
I'm pretty sure that the COUNT(DISTINCT) is unnecessary, so I would advise removing it:
SELECT s.user_id, COUNT(*) as trips
FROM trips t INNER JOIN
session s
ON s.session_id = ANY(t.session_ids)
GROUP BY s.user_id ;

Why do you need to include a field in GROUP BY when using OVER (PARTITION BY x)?

I have a table for which I want to do a simple sum of a field, grouped by two columns. I then want the total for all values for each year_num.
See example: http://rextester.com/QSLRS68794
This query is throwing: "42803: column "foo.num_cust" must appear in the GROUP BY clause or be used in an aggregate function", and I cannot figure out why. Why would an aggregate function using the OVER (PARTITION BY x) require the summed field to be in GROUP BY??
select
year_num
,age_bucket
,sum(num_cust)
--,sum(num_cust) over (partition by year_num) --THROWS ERROR!!
from
foo
group by
year_num
,age_bucket
order by 1,2
TABLE:
| loc_id | year_num | gen | cust_category | cust_age | num_cust | age_bucket |
|--------|-----------|------|----------------|-----------|-----------|-------------|
| 1 | 2016 | M | cash | 41 | 2 | 04_<45 |
| 1 | 2016 | F | Prepaid | 41 | 1 | 03_<35 |
| 1 | 2016 | F | cc | 61 | 1 | 05_45+ |
| 1 | 2016 | F | cc | 19 | 2 | 02_<25 |
| 1 | 2016 | M | cc | 64 | 1 | 05_45+ |
| 1 | 2016 | F | cash | 46 | 1 | 05_45+ |
| 1 | 2016 | F | cash | 27 | 3 | 03_<35 |
| 1 | 2016 | M | cash | 42 | 1 | 04_<45 |
| 1 | 2017 | F | cc | 35 | 1 | 04_<45 |
| 1 | 2017 | F | cc | 37 | 1 | 04_<45 |
| 1 | 2017 | F | cash | 46 | 1 | 05_45+ |
| 1 | 2016 | F | cash | 19 | 4 | 02_<25 |
| 1 | 2017 | M | cash | 43 | 1 | 04_<45 |
| 1 | 2017 | M | cash | 29 | 1 | 03_<35 |
| 1 | 2016 | F | cc | 13 | 1 | 01_<18 |
| 1 | 2017 | F | cash | 16 | 2 | 01_<18 |
| 1 | 2016 | F | cc | 17 | 2 | 01_<18 |
| 1 | 2016 | M | cc | 17 | 2 | 01_<18 |
| 1 | 2017 | F | cash | 18 | 9 | 02_<25 |
DESIRED OUTPUT:
| year_num | age_bucket | sum | sum over (year_num) |
|----------|------------|-----|---------------------|
| 2016 | 01_<18 | 5 | 21 |
| 2016 | 02_<25 | 6 | 21 |
| 2016 | 03_<35 | 4 | 21 |
| 2016 | 04_<45 | 3 | 21 |
| 2016 | 05_45+ | 3 | 21 |
| 2017 | 01_<18 | 2 | 16 |
| 2017 | 02_<25 | 9 | 16 |
| 2017 | 03_<35 | 1 | 16 |
| 2017 | 04_<45 | 3 | 16 |
| 2017 | 05_45+ | 1 | 16 |
You need to nest the sum()s:
select year_num, age_bucket, sum(num_cust),
sum(sum(num_cust)) over (partition by year_num) --WORKS!!
from foo
group by year_num, age_bucket
order by 1, 2;
Why? Well, the window function is not doing aggregation. The argument needs to be an expression that can be evaluated after the group by (because this is an aggregation query). Because num_cust is not a group by key, it needs an aggregation function.
Perhaps this is clearer if you used a subquery:
select year_num, age_bucket, sum_num_cust,
sum(sum_num_cust) over (partition by year_num)
from (select year_num, age_bucket, sum(num_cust) as sum_num_cust
from foo
group by year_num, age_bucket
) ya
order by 1, 2;
These two queries do exactly the same thing. But with the subquery it should be more obvious why you need the extra aggregation.

Running total SQL server - AGAIN

i'm aware that this question has been asked multiple times and I have read those threads to get to where I am now, but those solutions don't seem to be working. I need to have a running total of my ExpectedAmount...
I have the following table:
+--------------+----------------+
| ExpectedDate | ExpectedAmount |
+--------------+----------------+
| 1 | 2485513 |
| 2 | 526032 |
| 3 | 342041 |
| 4 | 195807 |
| 5 | 380477 |
| 6 | 102233 |
| 7 | 539951 |
| 8 | 107145 |
| 10 | 165110 |
| 11 | 18795 |
| 12 | 27177 |
| 13 | 28232 |
| 14 | 154631 |
| 15 | 5566585 |
| 16 | 250814 |
| 17 | 90444 |
| 18 | 105424 |
| 19 | 62132 |
| 20 | 1799349 |
| 21 | 303131 |
| 22 | 459464 |
| 23 | 723488 |
| 24 | 676514 |
| 25 | 17311911 |
| 26 | 4876062 |
| 27 | 4844434 |
| 28 | 4039687 |
| 29 | 1418648 |
| 30 | 4366189 |
| 31 | 9028836 |
+--------------+----------------+
I have the following SQL:
SELECT a.ExpectedDate, a.ExpectedAmount, (SELECT SUM(b.ExpectedAmount)
FROM UnpaidManagement..Expected b
WHERE b.ExpectedDate <= a.ExpectedDate)
FROM UnpaidManagement..Expected a
The result of the above SQL is this:
+--------------+----------------+--------------+
| ExpectedDate | ExpectedAmount | RunningTotal |
+--------------+----------------+--------------+
| 1 | 2485513 | 2485513 |
| 2 | 526032 | 9480889 |
| 3 | 342041 | 46275618 |
| 4 | 195807 | 59866450 |
| 5 | 380477 | 60246927 |
| 6 | 102233 | 60349160 |
| 7 | 539951 | 60889111 |
| 8 | 107145 | 60996256 |
| 10 | 165110 | 2650623 |
| 11 | 18795 | 2669418 |
| 12 | 27177 | 2696595 |
| 13 | 28232 | 2724827 |
| 14 | 154631 | 2879458 |
| 15 | 5566585 | 8446043 |
| 16 | 250814 | 8696857 |
| 17 | 90444 | 8787301 |
| 18 | 105424 | 8892725 |
| 19 | 62132 | 8954857 |
| 20 | 1799349 | 11280238 |
| 21 | 303131 | 11583369 |
| 22 | 459464 | 12042833 |
| 23 | 723488 | 12766321 |
| 24 | 676514 | 13442835 |
| 25 | 17311911 | 30754746 |
| 26 | 4876062 | 35630808 |
| 27 | 4844434 | 40475242 |
| 28 | 4039687 | 44514929 |
| 29 | 1418648 | 45933577 |
| 30 | 4366189 | 50641807 |
| 31 | 9028836 | 59670643 |
+--------------+----------------+--------------+
You can tell from the first few values already that the math is all off, but then at some points the math adds up?! I'm Too confused !! Could someone please point me to another solution or to where I have gone wrong with this?
I am using SQL Server 2008.
It is because your ExpectedDate column of type varchar. Try this:
SELECT a.ExpectedDate, a.ExpectedAmount, (SELECT SUM(b.ExpectedAmount)
FROM UnpaidManagement..Expected b
WHERE CAST(b.ExpectedDate as int) <= CAST(a.ExpectedDate as int))
FROM UnpaidManagement..Expected a
Note that it could be inefficient query.
I think this gonna work:
SELECT a.ExpectedDate,
a.ExpectedAmount,
SUM(b.ExpectedAmount) RuningTotal
FROM UnpaidManagement..Expected a
LEFT JOIN UnpaidManagement..Expected b
ON CAST(b.ExpectedDate as int) <= CAST(a.ExpectedDate as int)
GROUP BY a.ExpectedDate, a.ExpectedAmount
AND:
SELECT a.ExpectedDate,
a.ExpectedAmount,
c.RuningTotal
FROM UnpaidManagement..Expected a
CROSS APPLY (
SELECT SUM(b.ExpectedAmount) AS RuningTotal
FROM UnpaidManagement..Expected b
WHERE CAST(b.ExpectedDate as int) <= CAST(a.ExpectedDate as int)
) c

Access/SQL - split multiple columns to multiple rows based on column ID

I've tried a few things but can't get this to work efficiently. Help!
Customer dataset example (basically a dump of over 100 sensor board readings with 42 sensors per board)
| BoardName | ReadingTime | Sensor1 | Sensor2 | Sensor3 | Sensor4 ... Sensor42 |
-------------------------------------------------------------------------------------
| BoardA | 1224201301 | 18 | 24 | 7 | etc etc for each column |
| BoardB | 1224201301 | 18 | 23 | 8 | etc etc for each column |
| BoardC | 1224201301 | 17 | 24 | 7 | etc etc for each column |
| BoardD | 1224201301 | 16 | 23 | 6 | etc etc for each column |
| BoardA | 1224201302 | 18 | 22 | 5 | etc etc for each column |
| BoardB | 1224201302 | 18 | 23 | 5 | etc etc for each column |
-------------------------------------------------------------------------------------
This seems like a pretty inefficient table design. I'd like to get it into SQL more like the following example, which makes the data a little more accessible.
| SensorID | ReadingTime | SensorValue |
----------------------------------------
| BrdASen1 | 1224201301 | 18 |
| BrdASen2 | 1224201301 | 24 |
| BrdASen3 | 1224201301 | 7 |
| BrdBSen1 | 1224201301 | 18 |
| BrdBSen1 | 1224201301 | 23 |
| BrdBSen1 | 1224201301 | 8 |
| etc etc |
----------------------------------------
So basically, I want to iterate through each row of imported data, split off the 42 columns into individual rows with a 3-column SensorID/Date/Value format.
I would use this table structure:
|BoardName|SensorId| ReadingTime | SensorValue |
------------------------------------------------
|BoardA | 1 | 1224201301 | 18 |
|BoardA | 2 | 1224201301 | 24 |
SensorId values would be from 1 - 42.
To populate the table, you'll need to use the UNPIVOT operator.