The IDs change in the column - sql

I am quite a novice in programming and I kind need your help regarding SQL and an issue I notice.
I have a table:
date, ID, secondary ID, expenses
jul2020 258 0004 1000
jul2020 xxx xxxx xxx
...... .... .... .....
aug2020 258 0008 2000
aug2020 xxx xxxx xxx
aug2020 500 0004 1000
Id and secondary should be unique and always matching. But I notice that they are not. It's either correct the ID or the secondary ID. I want to sum all the D column per unique ID.
Thanks for reading and if you have any ideas would be very helpful
UPDATE: everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the first date and the same secondary ID as the first date ( or any day just to be consistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.

Related

SQL-sum over dynamic period

I have 2 tables: Customers and Actions, where each customer has uniqe ID (which can be found in each table).
Part of the customers became club members at a specific date (change between the customers). I'm trying to summarize their purchases until that date, and to get those who purchase more than (for example) 200 until they become club members.
For example, I can have the following customer:
custID purchDate purchAmount
1 2015-05-12 100
1 2015-07-12 150
1 2015-12-29 320
Now, assume that custID=1 became a club member at 2015-12-25; in that case, I'd like to get SUM(purchAmount)=250 (pay attention that I'd like to get this customer because 250>200).
I tried the following:
SELECT cust.custID, SUM(purchAmount)totAmount
FROM customers cust
JOIN actions act
ON cust.custID=act.custID
WHERE act.clubMember=1
AND cust.purchDate<act.clubMemberDate
GROUP BY cust.custID
HAVING totAmount>200;
Is it the right way to "attack" this question, or should I use something like while loop over the clubMemberDate (which telling the truth-I don't know how to do)?
I'm working with Teradata.
Your help will be appreciated.

Is it possible to match the "next" unmatched record in a SQL query where there is no strictly unique common field between tables?

Using Access 2010 and its version of SQL, I am trying to find a way to relate two tables in a query where I do not have strict, unique values in each table, using concatenated fields that are mostly unique, then matching each unmatched next record (measured by a date field or the record id) in each table.
My business receives checks that we do not cash ourselves, but rather forward to a client for processing. I am trying to build a query that will match the checks that we forward to the client with a static report that we receive from the client indicating when checks were cashed. I have no control over what the client reports back to us.
When we receive a check, we record the name of the payor, the date that we received the check, the client's account number, the amount of the check, and some other details in a table called "Checks". We add a matching field which comes as close as we can get to a unique identifier to match against the client reports (more on that in a minute).
Checks:
ID Name Acct Amt Our_Date Match
__ ____ ____ ____ _____ ______
1 Dave 1001 10.51 2/14/14 1001*10.51
2 Joe 1002 12.14 2/28/14 1002*12.14
3 Sam 1003 50.00 3/01/14 1003*50.00
4 Sam 1003 50.00 4/01/14 1003*50.00
5 Sam 1003 50.00 5/01/14 1003*50.00
The client does not report back to us the date that WE received the check, the check number, or anything else useful for making unique matches. They report the name, account number, amount, and the date of deposit. The client's report comes weekly. We take that weekly report and append the records to make a second table out of it.
Return:
ID Name Acct Amt Their_Date Unique1
__ ____ ____ ____ _____ ______
355 Dave 1001 10.51 3/25/14 1001*10.51
378 Joe 1002 12.14 4/04/14 1002*12.14
433 Sam 1003 50.00 3/08/14 1003*50.00
599 Sam 1003 50.00 5/11/14 1003*50.00
Instead of giving us back the date we received the check, we get back the date that they processed it. There is no way to make a rule to compare the two dates, because the deposit dates vary wildly. So the closest thing I can get for a unique identifier is a concatenated field of the account number and the amount.
I am trying to match the records on these two tables so that I know when the checks we forward get deposited. If I do a simple join using the two concatenated fields, it works most of the time, but we run into a problem with payors like Sam, above, who is making regular monthly payments of the same amount. In a simple join, if one of Sam's payments appears in the Return table, it matches to all of the records in the Checks table.
To limit that behavior and match the first Sam entry on the Return table to the first Sam entry on the Checks table, I wrote the following query:
SELECT return.*, checks.*
FROM return, checks
WHERE (( ( checks.id ) = (SELECT TOP 1 id
FROM checks
WHERE match = return.unique1
ORDER BY [our_date]) ));
This works when there is only one of Sam's records in the Return table. The problem comes when the second entry for Sam hits the Return table (Return.ID 599) as the client's weekly reports are added to the table. When that happens, the query appropriately (for my purposes) only lists that two of Sam's checks have been processed, but uses the "Top 1 ID" record to supply the row's details from the Return table:
Checks_Return_query:
Checks.ID Name Acct Amt Our_Date Their_Date Return.ID
__ ____ ____ ____ _____ ______ ________
1 Dave 1001 10.51 2/14/14 3/25/14 355
2 Joe 1002 12.14 2/28/14 4/04/14 378
3 Sam 1003 50.00 3/01/14 3/08/14 433
4 Sam 1003 50.00 4/01/14 3/08/14 433
In other words, the query repeats the Return table info for record Return.ID 433 instead of matching Return.ID 599, which is I guess what I should expect from the TOP 1 operator.
So I am trying to figure out how I can get the query to take the two concatenated fields in Checks and Return, compare them to find matching sets, then select the next unmatched record in Checks (with "next" being measured either by the ID or Our_Date) with the next unmatched record in Return (again, with "next" being measured either by the ID or Their_Date).
I spent many hours in a dark room turning the query into various joins, and back again, looking at functions like WHERE NOT IN, WHERE NOT EXISTS, FIRST() NEXT() MIN() MAX(). I am afraid I am way over my head.
I am beginning to think that I may have a structural problem, and may need to write the "matched" records in this query to another table of completed transactions, so that I can differentiate between "matched" and "unmatched" records better. But that still wouldn't help me if two of Sam's transactions are on the same weekly report I get from my client.
Are there any suggestions as to query functions I should look into for further research, or confirmation that I am barking up the wrong tree?
Thanks in advance.
I'd say that you really need another table of completed transactions, it could be temporary table.
Regarding your fears "... if two of Sam's transactions are on the same weekly report ", you can use cursor in order to write records "one-by-one" instead of set based transaction.

Complicated SQL query vs datatable iteration and proccessing

I have a three table structure in SQL Server 2012: people, connections and messages. The affected schema would be like this:
People: Id (pk bigint), name...
Connections: Id (pk bigint), IdPpl1 fk, IdPpl2 fk
Messages: Id (pk uniqueidentifier), Idconnection (fk), Messagetype (smallint)
On the Connections table, IdPpl1 and IdPpl2 are fk's to people Id. It could happen to appear in this table the same "two people" but swapping their column, E.G:
Id IdPpl1 IdPpl2
.. ...... ......
3 101 105
8 105 101
9 101 106
10 106 101
The above situation is correct. Actually, those are the maximum occurrences of these "two people" in the table.
The Messages table holds the information of which "connection" sent a message.
Id IdConnection Messagetype
.. ............ ...........
24 3 1
25 8 1
26 3 2
27 8 2
28 9 3
29 10 2
(Note: the messages are one-way, that's why there can be two rows in the connections table affecting the same two people: on the first row, one person is the sender and the other the receiver, on the second row they swap)
Given a People Id, I need a SQL query to show "least connectiontype messages mutually sent by mutually connected people" and an extra colum indicating if the messagetype matches or not. The result should be like this, for People Id 101:
Person_id Person_name IdConnection MatchingMsgType
......... ........... ............ ...............
105 John 3 1
106 Peter 9 0
The first row appears because of MsgIds 24 and 25. A potential row corresponding with messages 26 and 27 won't appear because a previous matching messagetype was found.
The second row appears because of MsgIds 28 and 29, marking the messagetype as non-matching.
Currently I get all the "messages related to a person" and iterate through the datatable sorting, filtering and operating in-memory.
Would you go with a full-SQL solution (I want to preserve full isolation between app tiers) or is more suitable the datatable iteration?
Thanks in advance!!
Obviously it depends on the length of the resulting set of your current db query (the one resulting in all rows related to a user). It is not clear if rows are ever removed from you tables. If not, your solution does not scale, since the number of matching rows will grow for ever. If instead you can assert the the number of resulting rows has some bound (for example: the maximum number of connections a user can open at the same time) then your solution might be good enough.

Unsure of how to go about this query. Dunno what to call this type of query either please correct this title!

This query is supposed to run with ms access 2003 using SQL. the function JOIN is NOT supported explicitly. implicitly in the WHERE clause is fine...implicity anywhere is fine as long as the word JOIN INNER JOIN Etc is not used.
DayNumnber PastTime
.
.
.
333 Homework
333 TV
334 Date
620 Chores
620 Date
620 Homework
725 Chores
725 Date
888 Internet
888 TV
.
.
.
Hey I would like a query that can Show the most important past time done for each day (TV and internet do not count!) .So importance would be Homework > Chores > Date.So:
DayNumber PastTime
333 Homework
334 Date
620 Homework
725 Chores
Something that might change this problem. Altho all the different past times are listen in a table together. but that was because i appended the table. originally the homework entries. chore entries and date entriess . internet entriess. tv entries. came from different tables.
eg homework 333
homework 620
Is it easier to do it without appending these tables first? I would hopefully like it to be done with the appended table but ya
I was thinking of a mixture of insert. delete... but the hardest part is checking that there is something there for a date a few things and how to put the more important thing done that day . Thank you
Create another table with:
Pri | PastTime
--------------
1 | Homework
2 | Chores
3 | Date
This is a priority list for the items.
Next do:
SELECT MIN(Pri), DayNumber
FROM PastTime_table, Priority_table
WHERE PastTime_table.PastTime = Priority_table.PastTime
GROUP BY DayNumber
This will give you the most important past time for each day. And because TV and Internet are not listed they will not show up.
But it will give you a number, and not the name.
If you had a better SQL you could then join this back to the Priority_table and lookup the name. But I guess you will have to do that part manually.
If you are willing to change the name and call them:
A_Homework
B_Chores
C_Date
instead then you could do (without any extra table):
SELECT MIN(PastTime), DayNumber
FROM PastTime_table
GROUP BY DayNumber
Since it sorts the name alphabetically it will always give you the best one.
You can add a WHERE to remove TV and Internet.

Table structure of a student

I want a table structure which can store the details of the student like the below format.
If the student is in
10 th standard -> I need his aggregate % from 1st standard to 9th standard.
5 th standard -> I need his aggregate % from 1st standard to 4th standard.
1 st standard -> No aggregate % has to be displayed.
And the most important thing is ' we need to use only one table'. Please form a table structure with no redundant values.
Any ideas will be greatly appreciated......
No friends this is not a home work. This is asked in Oracle interview, conducted in Hyderabad day before yesterday '24th July, 2010',. He asked me the table structure.
He even did not asked me the query. He asked me how I will design the table. Please advice me.
id | name | grade | aggregate
This would do the trick, id is your primary key, name is students first last name, grade is what grade he is in and aggregate is aggregate % based on the grade.
Fro example some rows might be:
10 | Bill Cosby | 10 | 90
11 | Jerry Seinfeld | 4 | 60
Bill Cosby would have aggregate percent of 90 in grades 1-9, and jerry would have 60 in grades 1-3. In this case it is one table and boils down to you managing the rule of aggregation for this table, since it has to be one table.
If this is an interview question, it looks like they would like to check your knowledge on Nested Tables. Essentially you would have one column as roll number, and other column which is a nested table as Class and Percentage.