Understanding the display output in SCIP and branch-and-cut mechanism - scip

I am trying to understand the meaning of the display output when using SCIP to solve an MILP using Branch-and-cut code. I use the TSP example as a reference.
time | node | left |LP iter|LP it/n|mem/heur|mdpt |vars |cons |rows |cuts |sepa|confs|strbr| dualbound | primalbound | gap | compl.
p 1.9s| 1 | 0 | 0 | - | vbounds| 0 |1861 | 130 | 126 | 0 | 0 | 3 | 0 | 0.000000e+00 | 5.523000e+03 | Inf | unknown
2.2s| 1 | 0 | 1320 | - | 15M | 0 |1861 | 130 | 126 | 0 | 0 | 3 | 0 | 9.795000e+02 | 5.523000e+03 | 463.86%| unknown
2.2s| 1 | 0 | 1321 | - | 15M | 0 |1861 | 130 | 126 | 0 | 0 | 3 | 0 | 9.800000e+02 | 5.523000e+03 | 463.57%| unknown
4.6s| 1 | 0 | 1341 | - | 15M | 0 |1861 | 130 | 128 | 2 | 1 | 3 | 0 | 9.800000e+02 | 5.523000e+03 | 463.57%| unknown
4.6s| 1 | 0 | 1393 | - | 15M | 0 |1861 | 130 | 129 | 3 | 2 | 3 | 0 | 9.800000e+02 | 5.523000e+03 | 463.57%| unknown
4.7s| 1 | 0 | 1422 | - | 15M | 0 |1861 | 130 | 136 | 10 | 3 | 3 | 0 | 9.800000e+02 | 5.523000e+03 | 463.57%| unknown
4.8s| 1 | 0 | 1472 | - | 16M | 0 |1861 | 130 | 139 | 13 | 4 | 3 | 0 | 9.860000e+02 | 5.523000e+03 | 460.14%| unknown
4.8s| 1 | 0 | 1472 | - | 16M | 0 |1861 | 130 | 139 | 13 | 4 | 3 | 0 | 9.860000e+02 | 5.523000e+03 | 460.14%| unknown
4.9s| 1 | 0 | 1479 | - | 17M | 0 |1861 | 130 | 144 | 18 | 5 | 3 | 0 | 9.925000e+02 | 5.523000e+03 | 456.47%| unknown
4.9s| 1 | 0 | 1480 | - | 17M | 0 |1861 | 130 | 144 | 18 | 5 | 3 | 0 | 9.930000e+02 | 5.523000e+03 | 456.19%| unknown
5.0s| 1 | 0 | 1489 | - | 17M | 0 |1861 | 130 | 148 | 22 | 6 | 3 | 0 | 9.930000e+02 | 5.523000e+03 | 456.19%| unknown
5.0s| 1 | 0 | 1530 | - | 17M | 0 |1861 | 130 | 151 | 25 | 7 | 3 | 0 | 9.930000e+02 | 5.523000e+03 | 456.19%| unknown
5.1s| 1 | 0 | 1558 | - | 17M | 0 |1861 | 130 | 153 | 27 | 8 | 3 | 0 | 9.957500e+02 | 5.523000e+03 | 454.66%| unknown
5.1s| 1 | 0 | 1559 | - | 17M | 0 |1861 | 130 | 153 | 27 | 8 | 3 | 0 | 9.960000e+02 | 5.523000e+03 | 454.52%| unknown
5.2s| 1 | 0 | 1680 | - | 17M | 0 |1861 | 130 | 160 | 34 | 9 | 3 | 0 | 1.019750e+03 | 5.523000e+03 | 441.60%| unknown
time | node | left |LP iter|LP it/n|mem/heur|mdpt |vars |cons |rows |cuts |sepa|confs|strbr| dualbound | primalbound | gap | compl.
5.2s| 1 | 0 | 1681 | - | 17M | 0 |1861 | 130 | 160 | 34 | 9 | 3 | 0 | 1.020000e+03 | 5.523000e+03 | 441.47%| unknown
5.4s| 1 | 0 | 1795 | - | 17M | 0 |1861 | 130 | 165 | 39 | 10 | 3 | 0 | 1.040500e+03 | 5.523000e+03 | 430.80%| unknown
5.4s| 1 | 0 | 1796 | - | 17M | 0 |1861 | 130 | 165 | 39 | 10 | 3 | 0 | 1.041000e+03 | 5.523000e+03 | 430.55%| unknown
5.5s| 1 | 0 | 1822 | - | 17M | 0 |1861 | 130 | 170 | 44 | 11 | 3 | 0 | 1.041000e+03 | 5.523000e+03 | 430.55%| unknown
5.6s| 1 | 0 | 1859 | - | 18M | 0 |1861 | 130 | 162 | 48 | 12 | 3 | 0 | 1.041000e+03 | 5.523000e+03 | 430.55%| unknown
5.7s| 1 | 0 | 1880 | - | 18M | 0 |1861 | 130 | 172 | 58 | 13 | 3 | 0 | 1.041000e+03 | 5.523000e+03 | 430.55%| unknown
5.8s| 1 | 0 | 1917 | - | 18M | 0 |1861 | 130 | 177 | 63 | 14 | 3 | 0 | 1.044500e+03 | 5.523000e+03 | 428.77%| unknown
5.8s| 1 | 0 | 1918 | - | 18M | 0 |1861 | 130 | 177 | 63 | 14 | 3 | 0 | 1.045000e+03 | 5.523000e+03 | 428.52%| unknown
5.9s| 1 | 0 | 1993 | - | 18M | 0 |1861 | 130 | 182 | 68 | 15 | 3 | 0 | 1.047643e+03 | 5.523000e+03 | 427.18%| unknown
5.9s| 1 | 0 | 1994 | - | 18M | 0 |1861 | 130 | 182 | 68 | 15 | 3 | 0 | 1.048000e+03 | 5.523000e+03 | 427.00%| unknown
6.0s| 1 | 0 | 2022 | - | 18M | 0 |1861 | 130 | 171 | 70 | 16 | 3 | 0 | 1.048750e+03 | 5.523000e+03 | 426.63%| unknown
6.0s| 1 | 0 | 2023 | - | 18M | 0 |1861 | 130 | 171 | 70 | 16 | 3 | 0 | 1.049000e+03 | 5.523000e+03 | 426.50%| unknown
6.1s| 1 | 0 | 2106 | - | 18M | 0 |1861 | 130 | 176 | 75 | 17 | 3 | 0 | 1.052250e+03 | 5.523000e+03 | 424.88%| unknown
6.1s| 1 | 0 | 2107 | - | 18M | 0 |1861 | 130 | 176 | 75 | 17 | 3 | 0 | 1.053000e+03 | 5.523000e+03 | 424.50%| unknown
6.3s| 1 | 0 | 2148 | - | 19M | 0 |1861 | 130 | 178 | 77 | 18 | 3 | 0 | 1.053375e+03 | 5.523000e+03 | 424.31%| unknown
time | node | left |LP iter|LP it/n|mem/heur|mdpt |vars |cons |rows |cuts |sepa|confs|strbr| dualbound | primalbound | gap | compl.
6.3s| 1 | 0 | 2149 | - | 19M | 0 |1861 | 130 | 178 | 77 | 18 | 3 | 0 | 1.054000e+03 | 5.523000e+03 | 424.00%| unknown
6.4s| 1 | 0 | 2210 | - | 20M | 0 |1861 | 130 | 162 | 81 | 19 | 3 | 0 | 1.054167e+03 | 5.523000e+03 | 423.92%| unknown
6.5s| 1 | 0 | 2211 | - | 20M | 0 |1861 | 130 | 162 | 81 | 19 | 3 | 0 | 1.055000e+03 | 5.523000e+03 | 423.51%| unknown
6.6s| 1 | 0 | 2269 | - | 20M | 0 |1861 | 130 | 165 | 84 | 20 | 3 | 0 | 1.058111e+03 | 5.523000e+03 | 421.97%| unknown
6.6s| 1 | 0 | 2270 | - | 20M | 0 |1861 | 130 | 165 | 84 | 20 | 3 | 0 | 1.059000e+03 | 5.523000e+03 | 421.53%| unknown
6.7s| 1 | 0 | 2276 | - | 20M | 0 |1861 | 130 | 166 | 85 | 21 | 3 | 0 | 1.059000e+03 | 5.523000e+03 | 421.53%| unknown
8.3s| 1 | 2 | 2700 | - | 21M | 0 |1861 | 136 | 166 | 85 | 23 | 9 | 22 | 1.066655e+03 | 5.523000e+03 | 417.79%| unknown
*14.4s| 30 | 25 | 4452 | 75.0 | LP | 7 |1861 | 136 | 138 | 125 | 0 | 9 | 167 | 1.067000e+03 | 5.459000e+03 | 411.62%| 0.86%
29.9s| 100 | 91 | 6915 | 46.9 | 23M | 17 |1861 | 147 | 145 | 288 | 2 | 20 | 654 | 1.067000e+03 | 5.459000e+03 | 411.62%| 1.46%
*33.4s| 131 | 100 | 7858 | 42.9 |strongbr| 18 |1861 | 153 | 144 | 349 | 1 | 26 | 788 | 1.067000e+03 | 1.530000e+03 | 43.39%| 1.53%
42.5s| 200 | 155 | 9970 | 38.7 | 24M | 18 |1861 | 170 | 142 | 455 | 2 | 43 |1097 | 1.067000e+03 | 1.530000e+03 | 43.39%| 2.17%
51.6s| 300 | 237 | 13159 | 36.4 | 26M | 18 |1861 | 224 | 139 | 640 | 2 | 97 |1277 | 1.067000e+03 | 1.530000e+03 | 43.39%| 2.33%
57.4s| 400 | 309 | 15846 | 34.0 | 27M | 19 |1861 | 238 | 152 | 820 | 2 | 113 |1426 | 1.067000e+03 | 1.530000e+03 | 43.39%| 3.10%
63.4s| 500 | 391 | 19168 | 33.9 | 29M | 24 |1861 | 319 | 145 | 926 | 2 | 195 |1560 | 1.067000e+03 | 1.530000e+03 | 43.39%| 3.22%
L68.4s| 584 | 111 | 22609 | 34.9 | rins| 26 |1861 | 330 | 149 |1016 | 1 | 256 |1619 | 1.067000e+03 | 1.127000e+03 | 5.62%| 10.59%
I also used display display to get some idea about what the column headers in the display output mean.
cons is just the globally valid constraints in the problem
rows is number of LP rows in current node
cuts is total number of cuts applied to the LPs
sepa is number of separation rounds performed at the current node
So I have a a few questions about this. I assumed rows should be 0 when the problem starts but I am not sure why there at 126 rows at the beginning of the problem. I add LP rows using the following function:
SCIP_ROW *row;
SCIP_CALL(SCIPcreateEmptyRowConshdlr(scip, &row, conshdlr, "subtour_elimination", -SCIPinfinity(scip), tour.size(), FALSE, FALSE, TRUE));
Are cuts added by SCIP automatically?
How does SCIP add globally valid constraints to the constraints pool? Does it take from the rows or from the cuts or from both? Is there a way to add globally valid constraints directly from the constraint handler?

The TSP example is one of the most popular examples that the SCIP team uses to highlight SCIP's constraint capabilities. You can find the mathematical model of this example in the slides of this introduction to SCIP, Section Constraint Integer Programming.
The LP relaxation initially consists of the so-called node-degree constraints, which require that each node is adjacent to exactly two edges in a solution. The subtour elimination constraint is present as a single constraint without row in the LP, but only adds actual linear rows to the LP relaxation if necessary.
Other general-purpose separators such as Gomory or CMIR cuts are active throughout the search. You can use display statistics and browse the "Separators" section to learn which cutting plane methods were enabled.
In order to inspect the initial LP relaxation, you may use the various writing possibilities that SCIP has to dump the current model into an LP file.
You can, for example, stop SCIP after the initial root LP relaxation (just press CTRL + C timely) and use
write lp tsp.lp for the LP relaxation
write mip tsp.lp for the MIP relaxation
write transproblem tsp.cip for the current transformed problem
The first two methods are restricted to the LP format, which is nicely readable. Only the last format, SCIP's own CIP format, will print the subtour-elimination constraints, as well. Rows/Cuts that were added to the current LP relaxation may not be part of the transformed problem.
tldr; choose wisely in which format you print the relaxation :)

Related

Solar-Heating: Data analytics for Grafana, advanced query

I would need some help with a very specific use case I have for my homelab.
I do have some solar panels on my roof, and I do extract a lot of data points to my server. I am using a specific app for that, making it easy to consume and automate stuff for that data (iobroker). The data I do save into a progres database. (No questions please why not Influx or TimescaleDB, postgres is what I need to live with...)
I use everything on docker right now, works perfectly. While I was able to create numerous dashboard on Grafana, display everything I like there, there is one specific "thing" I was unable to do, and after month of trying to get it done I finally ask for help. I do have a device supporting my heating from generated power to warm up the water. The device is using energy that we would normally feed back to the grid. The device is updating the power it pushes to the heating device pretty much every second. I am pulling the data from the device also every second. However I do have the logging configured in the way, that is only logs data when there is a difference to the previous datapoint.
One example:
Time
consumption in W
2018-02-21 12:00:00
3500
2018-02-21 12:00:01
1470
2018-02-21 12:00:02
1470
2018-02-21 12:00:03
1470
2018-02-21 12:00:00
1600
The second and third entry with the value of "1470" would not exist!
So first issue I have is a missing data point(s). What I would like to achieve is to have a calculation showing the consumption by individual day, month, year and all-time.
This does not need to happen inside Grafana, and I don't think Grafana can do this at all. There are options to do similar things in Grafana, but they do not provide an accurate result ($__unixEpochGroupAlias(ts,1s,previous)). I do have every option that is needed to create the data, so there should not be any obstacle in your ideas, and store it again inside the DB.
The data is polled/stored every 1000ms, so every second. Idea is to use Ws (Watt-seconds) to easily calculate with accurate numbers, as well as to display them better in Wh or kWh.
The DB can be only queried with SQL - but as mentioned if calculations needs to be done in a different language or so, then this is also fine.
Tried everything I could think of. SQL queries, searching numerous posts, all avaialble SQL based Grafana options. Guess I need custom code, but that above my skillset.
Anything more you'd need to know? Let me know. Thanks in advance!
The data structure looks the following:
id=entry for the application to identify the datapoint ts=timestamp
val=value in Ws
The other values are not important, but I wanted to show them for completeness.
id | ts | val | ack | _from | q
----+---------------+------+-----+-------+---
23 | 1661439981910 | 1826 | t | 3 | 0
23 | 1661439982967 | 1830 | t | 3 | 0
23 | 1661439984027 | 1830 | t | 3 | 0
23 | 1661439988263 | 1828 | t | 3 | 0
23 | 1661439985088 | 1829 | t | 3 | 0
23 | 1661439987203 | 1829 | t | 3 | 0
23 | 1661439989322 | 1831 | t | 3 | 0
23 | 1661439990380 | 1830 | t | 3 | 0
23 | 1661439991439 | 1827 | t | 3 | 0
23 | 1661439992498 | 1829 | t | 3 | 0
23 | 1661440021097 | 1911 | t | 3 | 0
23 | 1661439993558 | 1830 | t | 3 | 0
23 | 1661440022156 | 1924 | t | 3 | 0
23 | 1661439994624 | 1830 | t | 3 | 0
23 | 1661440023214 | 1925 | t | 3 | 0
23 | 1661439995683 | 1828 | t | 3 | 0
23 | 1661440024273 | 1924 | t | 3 | 0
23 | 1661439996739 | 1830 | t | 3 | 0
23 | 1661440025332 | 1925 | t | 3 | 0
23 | 1661440052900 | 1694 | t | 3 | 0
23 | 1661439997797 | 1831 | t | 3 | 0
23 | 1661440026391 | 1927 | t | 3 | 0
23 | 1661439998855 | 1831 | t | 3 | 0
23 | 1661440027450 | 1925 | t | 3 | 0
23 | 1661439999913 | 1828 | t | 3 | 0
23 | 1661440028509 | 1925 | t | 3 | 0
23 | 1661440029569 | 1927 | t | 3 | 0
23 | 1661440000971 | 1830 | t | 3 | 0
23 | 1661440030634 | 1926 | t | 3 | 0
23 | 1661440002030 | 1838 | t | 3 | 0
23 | 1661440031694 | 1925 | t | 3 | 0
23 | 1661440053955 | 1692 | t | 3 | 0
23 | 1659399542399 | 0 | t | 3 | 0
23 | 1659399543455 | 1 | t | 3 | 0
23 | 1659399544511 | 0 | t | 3 | 0
23 | 1663581880895 | 2813 | t | 3 | 0
23 | 1663581883017 | 2286 | t | 3 | 0
23 | 1663581881952 | 2646 | t | 3 | 0
23 | 1663581884074 | 1905 | t | 3 | 0
23 | 1661440004144 | 1838 | t | 3 | 0
23 | 1661440032752 | 1926 | t | 3 | 0
23 | 1661440005202 | 1839 | t | 3 | 0
23 | 1661440034870 | 1924 | t | 3 | 0
23 | 1661440006260 | 1840 | t | 3 | 0
23 | 1661440035929 | 1922 | t | 3 | 0
23 | 1661440007318 | 1840 | t | 3 | 0
23 | 1661440036987 | 1918 | t | 3 | 0
23 | 1661440008377 | 1838 | t | 3 | 0
23 | 1661440038045 | 1919 | t | 3 | 0
23 | 1661440009437 | 1839 | t | 3 | 0
23 | 1661440039104 | 1900 | t | 3 | 0
23 | 1661440010495 | 1839 | t | 3 | 0
23 | 1661440040162 | 1877 | t | 3 | 0
23 | 1661440011556 | 1838 | t | 3 | 0
23 | 1661440041220 | 1862 | t | 3 | 0
23 | 1661440012629 | 1840 | t | 3 | 0
23 | 1661440042279 | 1847 | t | 3 | 0
23 | 1661440013687 | 1840 | t | 3 | 0
23 | 1661440043340 | 1829 | t | 3 | 0
23 | 1661440014746 | 1833 | t | 3 | 0
23 | 1661440044435 | 1817 | t | 3 | 0
23 | 1661440015804 | 1833 | t | 3 | 0
23 | 1661440045493 | 1789 | t | 3 | 0
23 | 1661440046551 | 1766 | t | 3 | 0
23 | 1661440016862 | 1846 | t | 3 | 0
23 | 1661440047610 | 1736 | t | 3 | 0
23 | 1661440048670 | 1705 | t | 3 | 0
23 | 1661440017920 | 1863 | t | 3 | 0
23 | 1661440049726 | 1694 | t | 3 | 0
23 | 1661440050783 | 1694 | t | 3 | 0
23 | 1661440018981 | 1876 | t | 3 | 0
23 | 1661440051840 | 1696 | t | 3 | 0
23 | 1661440055015 | 1692 | t | 3 | 0
23 | 1661440056071 | 1693 | t | 3 | 0
23 | 1661440322966 | 1916 | t | 3 | 0
23 | 1661440325082 | 1916 | t | 3 | 0
23 | 1661440326142 | 1926 | t | 3 | 0
23 | 1661440057131 | 1693 | t | 3 | 0
23 | 1661440327199 | 1913 | t | 3 | 0
23 | 1661440058189 | 1692 | t | 3 | 0
23 | 1661440328256 | 1915 | t | 3 | 0
23 | 1661440059247 | 1691 | t | 3 | 0
23 | 1661440329315 | 1923 | t | 3 | 0
23 | 1661440060306 | 1692 | t | 3 | 0
23 | 1661440330376 | 1912 | t | 3 | 0
23 | 1661440061363 | 1676 | t | 3 | 0
23 | 1661440331470 | 1913 | t | 3 | 0
23 | 1661440062437 | 1664 | t | 3 | 0
23 | 1663581885133 | 1678 | t | 3 | 0
23 | 1661440332530 | 1923 | t | 3 | 0
23 | 1661440064552 | 1667 | t | 3 | 0
23 | 1661440334647 | 1915 | t | 3 | 0
23 | 1661440335708 | 1913 | t | 3 | 0
23 | 1661440065608 | 1665 | t | 3 | 0
23 | 1661440066665 | 1668 | t | 3 | 0
23 | 1661440336763 | 1912 | t | 3 | 0
23 | 1661440337822 | 1913 | t | 3 | 0
23 | 1661440338879 | 1911 | t | 3 | 0
23 | 1661440068780 | 1664 | t | 3 | 0
23 | 1661440339939 | 1912 | t | 3 | 0
(100 rows)```
iobroker=# \d ts_number
Table "public.ts_number"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | not null |
ts | bigint | | not null |
val | real | | |
ack | boolean | | |
_from | integer | | |
q | integer | | |
Indexes:
"ts_number_pkey" PRIMARY KEY, btree (id, ts)
You can do this with a mix of generate_series() and some window functions.
First we use generate_series() to get all the second timestamps in a desired range. Then we join to our readings to find what consumption values we have. Group nulls with their most recent non-null reading. Then set the consumption the same for the whole group.
So: if we have readings like this:
richardh=> SELECT * FROM readings;
id | ts | consumption
----+------------------------+-------------
1 | 2023-02-16 20:29:13+00 | 900
2 | 2023-02-16 20:29:16+00 | 1000
3 | 2023-02-16 20:29:20+00 | 925
(3 rows)
We can get all of the seconds we might want like this:
richardh=> SELECT generate_series(timestamptz '2023-02-16 20:29:13+00', timestamptz '2023-02-16 20:29:30+00', interval '1 second');
generate_series
------------------------
2023-02-16 20:29:13+00
2023-02-16 20:29:14+00
...etc...
2023-02-16 20:29:29+00
2023-02-16 20:29:30+00
(18 rows)
Then we join our complete set of timestamps to our readings:
WITH wanted_timestamps (ts) AS (
SELECT generate_series(timestamptz '2023-02-16 20:29:13+00', timestamptz '2023-02-16 20:29:30+00', interval '1 second')
)
SELECT
wt.ts
, r.consumption
, sum(CASE WHEN r.consumption IS NOT NULL THEN 1 ELSE 0 END)
OVER (ORDER BY ts) AS group_num
FROM
wanted_timestamps wt
LEFT JOIN readings r USING (ts)
ORDER BY wt.ts;
ts | consumption | group_num
------------------------+-------------+-----------
2023-02-16 20:29:13+00 | 900 | 1
2023-02-16 20:29:14+00 | | 1
2023-02-16 20:29:15+00 | | 1
2023-02-16 20:29:16+00 | 1000 | 2
2023-02-16 20:29:17+00 | | 2
2023-02-16 20:29:18+00 | | 2
2023-02-16 20:29:19+00 | | 2
2023-02-16 20:29:20+00 | 925 | 3
2023-02-16 20:29:21+00 | | 3
2023-02-16 20:29:22+00 | | 3
2023-02-16 20:29:23+00 | | 3
2023-02-16 20:29:24+00 | | 3
2023-02-16 20:29:25+00 | | 3
2023-02-16 20:29:26+00 | | 3
2023-02-16 20:29:27+00 | | 3
2023-02-16 20:29:28+00 | | 3
2023-02-16 20:29:29+00 | | 3
2023-02-16 20:29:30+00 | | 3
(18 rows)
Finally, fill in the missing consumption values:
WITH wanted_timestamps (ts) AS (
SELECT generate_series(timestamptz '2023-02-16 20:29:13+00', timestamptz '2023-02-16 20:29:30+00', interval '1 second')
), grouped_values AS (
SELECT
wt.ts
, r.consumption
, sum(CASE WHEN r.consumption IS NOT NULL THEN 1 ELSE 0 END)
OVER (ORDER BY ts) AS group_num
FROM wanted_timestamps wt
LEFT JOIN readings r USING (ts)
)
SELECT
gv.ts
, first_value(gv.consumption) OVER (PARTITION BY group_num)
AS consumption
FROM
grouped_values gv
ORDER BY ts;
ts | consumption
------------------------+-------------
2023-02-16 20:29:13+00 | 900
2023-02-16 20:29:14+00 | 900
2023-02-16 20:29:15+00 | 900
2023-02-16 20:29:16+00 | 1000
2023-02-16 20:29:17+00 | 1000
2023-02-16 20:29:18+00 | 1000
2023-02-16 20:29:19+00 | 1000
2023-02-16 20:29:20+00 | 925
2023-02-16 20:29:21+00 | 925
2023-02-16 20:29:22+00 | 925
2023-02-16 20:29:23+00 | 925
2023-02-16 20:29:24+00 | 925
2023-02-16 20:29:25+00 | 925
2023-02-16 20:29:26+00 | 925
2023-02-16 20:29:27+00 | 925
2023-02-16 20:29:28+00 | 925
2023-02-16 20:29:29+00 | 925
2023-02-16 20:29:30+00 | 925
(18 rows)

How to calculate various sum columns based on value of another in SQL?

Question: Write a query, which will output the user count today, as well as from 7 (uc7), 14 (uc14), 30 (uc30) days ago
Table: num_users
+------------+------------+
| dateid | user_count |
+------------+------------+
| 2014-12-31 | 1010 |
| 2014-12-30 | 1000 |
| 2014-12-29 | 990 |
| 2014-12-28 | 980 |
| 2014-12-27 | 970 |
| 2014-12-26 | 960 |
| 2014-12-25 | 950 |
| 2014-12-24 | 940 |
| 2014-12-23 | 930 |
| 2014-12-22 | 920 |
| 2014-12-21 | 910 |
| 2014-12-20 | 900 |
| 2014-12-19 | 890 |
| 2014-12-18 | 880 |
| 2014-12-17 | 870 |
| 2014-12-16 | 860 |
| 2014-12-15 | 850 |
| 2014-12-14 | 840 |
| 2014-12-13 | 830 |
| 2014-12-12 | 820 |
| 2014-12-11 | 810 |
| 2014-12-10 | 800 |
| 2014-12-09 | 790 |
| 2014-12-08 | 780 |
| 2014-12-07 | 770 |
| 2014-12-06 | 760 |
| 2014-12-05 | 750 |
| 2014-12-04 | 740 |
| 2014-12-03 | 730 |
| 2014-12-02 | 720 |
| 2014-12-01 | 710 |
+------------+------------+
Desired Output:
+------------+------+------+------+------+
| dateid | uc | uc7 | uc14 | uc30 |
+------------+------+------+------+------+
| 2014-12-31 | 1010 | 940 | 870 | 710 |
| 2014-12-30 | 1000 | 930 | 860 | 0 |
| 2014-12-29 | 990 | 920 | 850 | 0 |
| 2014-12-28 | 980 | 910 | 840 | 0 |
| 2014-12-27 | 970 | 900 | 830 | 0 |
| 2014-12-26 | 960 | 890 | 820 | 0 |
| 2014-12-25 | 950 | 880 | 810 | 0 |
| 2014-12-24 | 940 | 870 | 800 | 0 |
| 2014-12-23 | 930 | 860 | 790 | 0 |
| 2014-12-22 | 920 | 850 | 780 | 0 |
| 2014-12-21 | 910 | 840 | 770 | 0 |
| 2014-12-20 | 900 | 830 | 760 | 0 |
| 2014-12-19 | 890 | 820 | 750 | 0 |
| 2014-12-18 | 880 | 810 | 740 | 0 |
| 2014-12-17 | 870 | 800 | 730 | 0 |
| 2014-12-16 | 860 | 790 | 720 | 0 |
| 2014-12-15 | 850 | 780 | 710 | 0 |
| 2014-12-14 | 840 | 770 | 0 | 0 |
| 2014-12-13 | 830 | 760 | 0 | 0 |
| 2014-12-12 | 820 | 750 | 0 | 0 |
| 2014-12-11 | 810 | 740 | 0 | 0 |
| 2014-12-10 | 800 | 730 | 0 | 0 |
| 2014-12-09 | 790 | 720 | 0 | 0 |
| 2014-12-08 | 780 | 710 | 0 | 0 |
| 2014-12-07 | 770 | 0 | 0 | 0 |
| 2014-12-06 | 760 | 0 | 0 | 0 |
| 2014-12-05 | 750 | 0 | 0 | 0 |
| 2014-12-04 | 740 | 0 | 0 | 0 |
| 2014-12-03 | 730 | 0 | 0 | 0 |
| 2014-12-02 | 720 | 0 | 0 | 0 |
| 2014-12-01 | 710 | 0 | 0 | 0 |
+------------+------+------+------+------+
How do I properly do this?
I tried my solution as below but it does not result in the right solution
SELECT dateid AS today,
(SELECT SUM(user_count) FROM num_users WHERE dateid = dateid) AS uc,
(SELECT SUM(user_count) FROM num_users WHERE dateid - 7) AS uc7,
(SELECT SUM(user_count) FROM num_users WHERE dateid - 14) AS uc14,
(SELECT SUM(user_count) FROM num_users WHERE dateid - 14) AS uc30
FROM num_users
This produces the presented output:
SELECT num_users.dateid, num_users.user_count AS uc,
(SELECT user_count FROM num_users AS A WHERE A.dateid=num_users.dateid-7) AS uc7,
(SELECT user_count FROM num_users AS A WHERE A.dateid=num_users.dateid-14) AS uc14,
(SELECT user_count FROM num_users AS A WHERE A.dateid=num_users.dateid-30) AS uc30
FROM num_users
ORDER BY num_users.dateid DESC;
But maybe you really want:
SELECT Sum(num_users.user_count) AS uc,
Sum(IIf([dateid]<=#12/31/2014#-7,[user_count],0)) AS uc7,
Sum(IIf([dateid]<=#12/31/2014#-14,[user_count],0)) AS uc14,
Sum(IIf([dateid]<=#12/31/2014#-30,[user_count],0)) AS uc30
FROM num_users;
Above tested with Access. If data actually continues through current date, replace #12/31/2014# with Date(). Formatting literal date and function will most likely be different in another database platform.

How to make sql hive when i have input this?

input:
| a.user_id | a_stream_length | b_stream_length | subtract_inactive |
-----------------------------------------------------------------------------
| a | 11 | 1686 | 22 |
| a | 1686 | 328 | 12 |
| a | 328 | 732 | 22 |
| a | 732 | 11 | 1699 |
| a | 11 | 2123 | 18 |
| a | 2123 | 160 | 2 |
| a | 160 | 1358 | 0 |
| a | 1358 | 129 | 1 |
| a | 129 | 4042 | 109334 |
output:
| a | (1686+11+328+732) (if subtract_inactive < 1000) |
| a | 732(a_stream_length) if subtract_inactive > 1000) |

Find the highest and lowest value locations within an interval on a column?

Given this pandas dataframe with two columns, 'Values' and 'Intervals'. How do I get a third column 'MinMax' indicating whether the value is a maximum or a minimum within that interval? The challenge for me is that the interval length and the distance between intervals are not fixed, therefore I post the question.
import pandas as pd
import numpy as np
data = pd.DataFrame([
[1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1],
[1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1],
[1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1],
[1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1],
[1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1],
[1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1],
[1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan],
[1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1],
[1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1],
[1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1],
[1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1],
[1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1],
[1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan],
[1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1],
[1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1],
[1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1],
[1873.174,1],[1873.691,np.nan],[1873.685,np.nan]
])
In the third column below you can see where the max and min is for each interval.
+-------+----------+-----------+---------+
| index | Value | Intervals | Min/Max |
+-------+----------+-----------+---------+
| 0 | 1879.289 | np.nan | |
| 1 | 1879.281 | np.nan | |
| 2 | 1879.292 | 1 | |
| 3 | 1879.295 | 1 | |
| 4 | 1879.481 | 1 | |
| 5 | 1879.294 | 1 | |
| 6 | 1879.268 | 1 | min |
| 7 | 1879.293 | 1 | |
| 8 | 1879.277 | 1 | |
| 9 | 1879.285 | 1 | |
| 10 | 1879.464 | 1 | |
| 11 | 1879.475 | 1 | |
| 12 | 1879.971 | 1 | |
| 13 | 1879.779 | 1 | |
| 17 | 1879.986 | 1 | |
| 18 | 1880.791 | 1 | max |
| 19 | 1880.29 | 1 | |
| 55 | 1879.253 | np.nan | |
| 56 | 1878.268 | np.nan | |
| 57 | 1875.73 | 1 | |
| 58 | 1876.792 | 1 | |
| 59 | 1875.977 | 1 | min |
| 60 | 1876.408 | 1 | |
| 61 | 1877.159 | 1 | |
| 62 | 1877.187 | 1 | |
| 63 | 1883.164 | 1 | |
| 64 | 1883.171 | 1 | |
| 65 | 1883.495 | 1 | |
| 66 | 1883.962 | 1 | |
| 67 | 1885.158 | 1 | |
| 68 | 1885.974 | 1 | max |
| 69 | 1886.479 | np.nan | |
| 70 | 1885.969 | np.nan | |
| 71 | 1884.693 | 1 | |
| 72 | 1884.977 | 1 | |
| 73 | 1884.967 | 1 | |
| 74 | 1884.691 | 1 | min |
| 75 | 1886.171 | 1 | max |
| 76 | 1886.166 | np.nan | |
| 77 | 1884.476 | np.nan | |
| 78 | 1884.66 | 1 | max |
| 79 | 1882.962 | 1 | |
| 80 | 1881.496 | 1 | |
| 81 | 1871.163 | 1 | min |
| 82 | 1874.985 | 1 | |
| 83 | 1874.979 | 1 | |
| 84 | 1871.173 | np.nan | |
| 85 | 1871.973 | np.nan | |
| 86 | 1871.682 | np.nan | |
| 87 | 1872.476 | np.nan | |
| 88 | 1882.361 | 1 | max |
| 89 | 1880.869 | 1 | |
| 90 | 1882.165 | 1 | |
| 91 | 1881.857 | 1 | |
| 92 | 1880.375 | 1 | |
| 93 | 1880.66 | 1 | |
| 94 | 1880.891 | 1 | |
| 95 | 1880.377 | 1 | |
| 96 | 1881.663 | 1 | |
| 97 | 1881.66 | 1 | |
| 98 | 1877.888 | 1 | |
| 99 | 1875.69 | 1 | |
| 100 | 1875.161 | 1 | min |
| 101 | 1876.697 | np.nan | |
| 102 | 1876.671 | np.nan | |
| 103 | 1879.666 | np.nan | |
| 111 | 1877.182 | np.nan | |
| 112 | 1878.898 | 1 | |
| 113 | 1878.668 | 1 | |
| 114 | 1878.871 | 1 | |
| 115 | 1878.882 | 1 | |
| 116 | 1879.173 | 1 | max |
| 117 | 1878.887 | 1 | |
| 118 | 1878.68 | 1 | |
| 119 | 1878.872 | 1 | |
| 120 | 1878.677 | 1 | |
| 121 | 1877.877 | 1 | |
| 122 | 1877.669 | 1 | |
| 123 | 1877.69 | 1 | |
| 124 | 1877.684 | 1 | |
| 125 | 1877.68 | 1 | |
| 126 | 1877.885 | 1 | |
| 127 | 1877.863 | 1 | |
| 128 | 1877.674 | 1 | |
| 129 | 1877.676 | 1 | |
| 130 | 1877.687 | 1 | |
| 131 | 1878.367 | 1 | |
| 132 | 1878.179 | 1 | |
| 133 | 1877.696 | 1 | |
| 134 | 1877.665 | 1 | min |
| 135 | 1877.667 | np.nan | |
| 136 | 1878.678 | np.nan | |
| 137 | 1878.661 | 1 | max |
| 138 | 1878.171 | 1 | |
| 139 | 1877.371 | 1 | |
| 140 | 1877.359 | 1 | |
| 141 | 1878.381 | 1 | |
| 142 | 1875.185 | 1 | min |
| 143 | 1875.367 | np.nan | |
| 144 | 1865.492 | np.nan | |
| 145 | 1865.495 | 1 | max |
| 146 | 1866.995 | 1 | |
| 147 | 1866.672 | 1 | |
| 148 | 1867.465 | 1 | |
| 149 | 1867.663 | 1 | |
| 150 | 1867.186 | 1 | |
| 151 | 1867.687 | 1 | |
| 152 | 1867.459 | 1 | |
| 153 | 1867.168 | 1 | |
| 154 | 1869.689 | 1 | |
| 155 | 1869.693 | 1 | |
| 156 | 1871.676 | 1 | |
| 157 | 1873.174 | 1 | min |
| 158 | 1873.691 | np.nan | |
| 159 | 1873.685 | np.nan | |
+-------+----------+-----------+---------+
isnull = data.iloc[:, 1].isnull()
minmax = data.groupby(isnull.cumsum()[~isnull])[0].agg(['idxmax', 'idxmin'])
data.loc[minmax['idxmax'], 'MinMax'] = 'max'
data.loc[minmax['idxmin'], 'MinMax'] = 'min'
data.MinMax = data.MinMax.fillna('')
print(data)
0 1 MinMax
0 1879.289 NaN
1 1879.281 NaN
2 1879.292 1.0
3 1879.295 1.0
4 1879.481 1.0
5 1879.294 1.0
6 1879.268 1.0 min
7 1879.293 1.0
8 1879.277 1.0
9 1879.285 1.0
10 1879.464 1.0
11 1879.475 1.0
12 1879.971 1.0
13 1879.779 1.0
14 1879.986 1.0
15 1880.791 1.0 max
16 1880.290 1.0
17 1879.253 NaN
18 1878.268 NaN
19 1875.730 1.0 min
20 1876.792 1.0
21 1875.977 1.0
22 1876.408 1.0
23 1877.159 1.0
24 1877.187 1.0
25 1883.164 1.0
26 1883.171 1.0
27 1883.495 1.0
28 1883.962 1.0
29 1885.158 1.0
.. ... ... ...
85 1877.687 1.0
86 1878.367 1.0
87 1878.179 1.0
88 1877.696 1.0
89 1877.665 1.0 min
90 1877.667 NaN
91 1878.678 NaN
92 1878.661 1.0 max
93 1878.171 1.0
94 1877.371 1.0
95 1877.359 1.0
96 1878.381 1.0
97 1875.185 1.0 min
98 1875.367 NaN
99 1865.492 NaN
100 1865.495 1.0 min
101 1866.995 1.0
102 1866.672 1.0
103 1867.465 1.0
104 1867.663 1.0
105 1867.186 1.0
106 1867.687 1.0
107 1867.459 1.0
108 1867.168 1.0
109 1869.689 1.0
110 1869.693 1.0
111 1871.676 1.0
112 1873.174 1.0 max
113 1873.691 NaN
114 1873.685 NaN
[115 rows x 3 columns]
data.columns=['Value','Interval']
data['Ingroup'] = (data['Interval'].notnull() + 0)
Use data['Interval'].notnull() to separate the groups...
Use cumsum() to number them with `groupno`...
Use groupby(groupno)..
Finally you want something using apply/idxmax/idxmin to label the max/min
But of course a for-loop as you suggested is the non-Pythonic but possibly simpler hack.

How to SUM from MySQL for every n record

I have a following result from query:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
+---------------+------+------+------+------+------+------+------+-------+
I would like to insert a SUM before enter different order_main_id, it would be like this result:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
| | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 |
+---------------+------+------+------+------+------+------+------+-------+
How to make this possible ?
You'll need to write a second Query which makes use of GROUP BY order_main_id.
Something like:
SELECT sum(S41+...) FROM yourTable GROUP BY orderMainId
K
You can actually do this in one query, but with a union all (really two queries, but the result sets are combined to make one awesome result set):
select
order_main_id,
S36,
S37,
S38,
S39,
S40,
S41,
S42,
S36 + S37 + S38 + S39 + S40 + S41 + S42 as total,
'Detail' as rowtype
from
tblA
union all
select
order_main_id,
sum(S36),
sum(S37),
sum(S38),
sum(S39),
sum(S40),
sum(S41),
sum(S42),
sum(S36 + S37 + S38 + S39 + S40 + S41 + S42),
'Summary' as rowtype
from
tblA
group by
order_main_id
order by
order_main_id, RowType
Remember that the order by affects the entirety of the union all, not just the last query. So, your resultset would look like this:
+---------------+------+------+------+------+------+------+------+-------+---------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total | rowtype |
+---------------+------+------+------+------+------+------+------+-------+---------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 | Detail |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 | Detail |
| 26 | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 | Summary |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 | Detail |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 | Detail |
| 35 | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 | Summary |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 | Summary |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 | Detail |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 | Detail |
| 39 | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 | Summary |
+---------------+------+------+------+------+------+------+------+-------+---------+
This way, you know what is and what isn't a detail or summary row, and the order_main_id that it's for. You could always (and probably should) hide this column in your presentation layer.
For things like these I think you should use a reporting library(such as Crystal Reports), it'll save you a lot of trouble, check JasperReports and similar projects on osalt