Calculating values from two tables where one has key in header and one has it in column values

Calculating values from two tables where one has key in header and one has it in column values - sql

I have a simple problem that I dont know how to solve in sql.
I have two tables,
cost :
a | b | c
-------+-------+---------------
31.99 | 14.12 | 133.1
second table: income
Party | sum
------+--------
A | 90
B | 12
C | 70
Now i want to get a result that substract for each party A, B, C the income-cost and finds the net value. I cannot compare the column header to column value. I am quite new to this, so I am struggling quite a lot. There should be really easy way of doing this.
I created the 'cost' table by
SELECT sum(A) as A, sum(B) as B, sum(C) as C FROM mytable;
may be there is clever way of creating this table in the same formate as income table that would make it easier to compare? I will appreciate any suggeestion on any of the two fronts. Thanks a lot!

You can compare, using case:
select party,
cost - (case when party = 'a' then a
when party = 'b' then b
when party = 'c' then c
else 0 end)
from cost c cross join
income i

Related

SQL command to remove groups of entries where all are equal (not merely DISTINCT)

i am so green in SQL that I don't even know how to properly phrase my question or look for an existing answer in stack overflow or anywhere else. Sorry!
Assume i have 3 columns. One is an ID and two data columns A and B. A single ID can have multiple entries. I like to remove all entries, where A and B are same for a given ID. Probably i give an example
ID
A
B
01
x
y
01
x
y
01
x
y
02
x
y
02
x
z
02
x
y
In this table I would like to remove all 3 entries that belong to ID 01 as A as well as B are all x and y, respectively. For ID 02, however, column B differs for the first and second entry. Therefore I like to keep ID 02. I hope this illustrates the idea sufficiently :-).
I am look for a 'scalable' solution, as I am not only looking at two data columns A and B, but actually 4 different columns.
Does anyone know how to set a proper filter in SQL to remove those entries according to my needs?
Many thanks.
Benjamin

As for this, it basically doesn't matter how many coumns you actually have, as long as they are identical
this can be used for an as joining basis for a DELETE
WITH CTE AS
(SELECT DISTINCT "ID", "A", "B" FROM tab1),
CTE2 AS (SELECT "ID", COUNT(*) count_ FROM CTE GROUP BY "ID" HAVING COUNT(*) >1)
SELECT "ID" FROM CTE2
| ID |
| -: |
| 2 |
db<>fiddle here

Identifying the most purchased combination of items in SQL

First of all, I hope everyone's staying safe out there.
So here's my question.
Currently I'm trying to figure out how I can identify the most purchased combination of items.
Most purchased combination of items must appear at the top (descending order is crucial).
Let's say I have a sales table that looks like this:
Cust_ID Item_ID
100 A
100 A
100 B
100 C
200 A
200 C
200 C
300 B
400 A
400 B
and the expected output looks something like this:
Comb_of_Item Count_of_Cust
A, B 10
A, C 7
B, C 4
A, B, C 2
Note that Customer 100 had purchased item "A" twice, which for the purpose of this exercise will be ignored (dups to be removed).
This means that Customer 100 would be counted as "A, B, C" NOT "A, A, B, C"
Any help/suggestion would be much appreciated.
Many thanks advance!

I believe this can be modify into a better query but right now this will get the job done.
select combo_of_item,count(combo_of_item) Count_of_Cust from (
select Cust_ID ,string_agg(Item_ID,',') combo_of_item from (
select distinct * from [table] ) a
group by Cust_ID) b
group by Combo_of_item
db<>fiddle
btw since OP didn't provide dbms, string_egg might have to alter depends on which db OP is currently using.

Excel - How To count(*) and groupby similar to SQL

I'm looking for a way to perform a SQL type command in Excel. I need to get a count of each string in a column without knowing the string's text before hand.
Here's some sample data, I want to get a count of each Name.
Name
----
A
B
C
A
D
B
IN SQL I'd
SELECT Name, count(*)
FROM #table
group by Name
And I'd expect to get
Name | Count
-----|------
A | 2
B | 2
C | 1
D | 1
How can I perform this operation in Excel?

You could go with the pivot tables that give you some options to analyze your data. There is a good example with explanation on this website: http://www.contextures.com/pivottablecountunique.html

Is there a [straightforward] way to order results first, then group by another column, with SQL?

I see that in an SQL query, the GROUP BY has to precede the ORDER BY expression. Does this imply that ordering is done after grouping would have discarded identical rows?
Because I seem to need to order rows by a timestamp first, then discard the rows with identical timestamp. And I don't know how to accomplish this.
I am using MySQL 5.1.41.
Here is the definition of the table expressed with create table:
create table
(
A int,
B timestamp
)
The data could be:
+-----+-----------------------+
| A | B |
+-----+-----------------------+
| 1 | today |
| 1 | yesterday |
| 2 | yesterday |
| 2 | tomorrow |
+-----+-----------------------+
The results of the query on the above table, which I am after, would be:
+-----+-----------------------+
| A | B |
+-----+-----------------------+
| 1 | today |
| 2 | tomorrow |
+-----+-----------------------+
Basically, I want the rows with the latest timestamp in column "B" (hence the mention of ORDER BY), and only one row for each value in column "A" (think DISTINCT or GROUP BY).
The actual problem behind the simplified example above:
In reality, I have two tables - users and payment_receipts:
create table users
(
phone_nr int(10) unsigned not null,
primary key (phone_nr)
)
create table payment_receipts
(
phone_nr int(10) unsigned not null,
payed_ts timestamp default current_timestamp not null,
payed_until_ts timestamp not null,
primary key (phone_nr, payed_ts, payed_until_ts)
)
The tables may include other columns but I omit these as irrelevant. Implementing a payment scheme, I have to send SMS to users across the cellular network, in periodic intervals depending on whether the payment is due or not. The payment is actualized when the SMS is sent as the recipient is taxed for it. I use the payment_receipts table to keep records of all payments done, i.e. for book-keeping. This is intended to model a real shop where both the buyer and the seller get a copy of the receipt of purchase, for reference. This table stores my (seller's) copy [of each receipt]. The customer's receipt is the received SMS itself. Each time an SMS is sent (and thus a payment is accomplished), the table is inserted a receipt record, stating who paid, when and "until when". To explain the latter, imagine a subscription service, but one which spans indefinitely until the user opt-out explicitly, at which point the corresponding user record is removed. A payment is made a month in advance, so as a rule, the difference between the payed_ts and payed_until_ts is 30 days worth of time.
I have a batch job that executes every day and needs to select a list of users that are due monthly payment as part of the automatic subscription renewal described above. To link this to the dummy example earlier, the phone number column phone_nr would be the column "A" and payed_until_ts would be column "B", but in reality there are two tables, which has to do with the following behaviour: when a user record is removed, the receipt must remain, for book-keeping. So not only do I need to group payments by date and discard all but the latest payment receipt date, I also need to watch out not to select receipts for which there no longer is a matching user record.
To solve the problem of selecting required records -- those that are due payment -- I need to find receipts with the latest payed_until_ts timestamp for each phone_nr (there may be several, obviously) and out of those records I further need to select only those phone numbers where payed_until_ts is earlier than the time the batch job executes. I then would send an SMS to each of these numbers, inserting a receipt record for each sent SMS, where payed_ts is now() and payed_until_ts is now() + interval 30 days.
But I can't seem to come up with the query required.

Select a,b from (select a,b from table order by b) as c group by a;

Yes, grouping is done first, and it affects a single select whereas ordering affects all the results from all select statements in a union, such as:
select a, 'max', max(b) from tbl group by a
union all select a, 'min', min(b) from tbl group by a
order by 1, 2
(using field numbers in order by since I couldn't be bothered to name my columns). Each group by affects only its select, the order by affects the combined result set.
It seems that what you're after can be achieved with:
select A, max(B) from tbl group by A
This uses the max aggregation function to basically do your pre-group ordering (it doesn't actually sort it in any decent DBMS, rather it will simply choose the maximum from an suitable index if available).

SELECT DISTINCT a,b
FROM tbl t
WHERE b = (SELECT MAX(b) FROM tbl WHERE tbl.a = t.a);

According to your new rules (tested with PostgreSQL)
Query You'd Want:
SELECT pr.phone_nr, pr.payed_ts, pr.payed_until_ts
FROM payment_receipts pr
JOIN users
ON (pr.phone_nr = users.phone_nr)
JOIN (select phone_nr, max(payed_until_ts) as payed_until_ts
from payment_receipts
group by phone_nr
) sub
ON ( pr.phone_nr = sub.phone_nr
AND pr.payed_until_ts = sub.payed_until_ts)
ORDER BY pr.phone_nr, pr.payed_ts, pr.payed_until_ts;
Original Answer (with updates):
CREATE TABLE foo (a NUMERIC, b TEXT, DATE);
INSERT INTO foo VALUES
(1,'a','2010-07-30'),
(1,'b','2010-07-30'),
(1,'c','2010-07-31'),
(1,'d','2010-07-31'),
(1,'a','2010-07-29'),
(1,'c','2010-07-29'),
(2,'a','2010-07-29'),
(2,'a','2010-08-01');
-- table contents
SELECT * FROM foo ORDER BY c,a,b;
a | b | c
---+---+------------
1 | a | 2010-07-29
1 | c | 2010-07-29
2 | a | 2010-07-29
1 | a | 2010-07-30
1 | b | 2010-07-30
1 | c | 2010-07-31
1 | d | 2010-07-31
2 | a | 2010-08-01
-- The following solutions both retrieve records based on the latest date
-- they both return the same result set, solution 1 is faster, solution 2
-- is easier to read
-- Solution 1:
SELECT foo.a, foo.b, foo.c
FROM foo
JOIN (select a, max(c) as c from foo group by a) bar
ON (foo.a=bar.a and foo.c=bar.c)
ORDER BY foo.a, foo.b, foo.c;
-- Solution 2:
SELECT a, b, MAX(c) AS c
FROM foo main
GROUP BY a, b
HAVING MAX(c) = (select max(c) from foo sub where main.a=sub.a group by a)
ORDER BY a, b;
a | b | c
---+---+------------
1 | c | 2010-07-31
1 | d | 2010-07-31
2 | a | 2010-08-01
(3 rows)
Comment:
1 is returned twice because their are multiple b values. This is acceptable (and advised). Your data should never have this problem, because c is based on b's value.

create table user_payments
(
phone_nr int NOT NULL,
payed_until_ts datetime NOT NULL
)
insert into user_payments
(phone_nr, payed_until_ts)
values
(1, '2016-01-28'), -- today
(1, '2016-01-27'), -- yesterday
(2, '2016-01-27'), -- yesterday
(2, '2016-01-29') -- tomorrow
select phone_nr, MAX(payed_until_ts) as latest_payment
from user_payments
group by phone_nr
-- OUTPUT:
-- phone_nr latest_payment
-- 1 2016-01-28 00:00:00.000
-- 2 2016-01-29 00:00:00.000
In the above example, I have used datetime column but similar query should work for timestamp column.
The MAX function will basically do the "ORDER BY" payed_until_ts column and pick the latest value for each phone_nr.
Also, you will get only one value for each phone_nr due to "GROUP BY" clause.

Add a summary row to MS Access query

I have a query stored in MS Access which is doing a standard select from an Access table. I would like to add a summary row at the end showing sums for some of the data above.
I have looked at DSum() but it isn't suitable as I would have to include the running total on each row as opposed to just the end.
Also, note that I don't want to sum data in column a - I would like to get an empty field for the summary of column a.
Example:
a | b | c
-------------
0 | 1 | 2
1 | 1 | 9
| 2 | 11 <-- Sums data above
Does anyone know how this problem can be solved in Access? An alternative might be to define a second query which does the aggregation and then merge it with the recordset of the first one, but this doesn't seem particularly elegant to me.
In SQL server it is apparently possible to use "COMPUTE" or "ROLLUP" but these are not supported under MS Access.

You can use a union query:
SELECT "" As Sort, a,b,c FROM Table
UNION ALL
SELECT "Total" As Sort, Sum(a) As A, Sum(b) As b, Sum(c) As C FROM Table
ORDER BY Sort
EDIT:
SELECT "" As Sort, a,b,c FROM Table
UNION ALL
SELECT "Total" As Sort, "" As A, Sum(b) As b, Sum(c) As C FROM Table
ORDER BY Sort

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas