I'm Working on a large financial project, building queries in SQL on MS-ACCESS.
These are my tables.
CE
anomes cod_c Name NIP Nip_Grupo
201706 1 ABC 10 50
201706 1 DDD 12 50
201706 2 CCC 11 50
O
anomes cod_c ID_O Nip_1 val_1 val_2
201706 1 ACA_00 10 500 200
201706 1 ACB_01 12 100 300
201706 2 ACC_07 11 50 400
OS
anomes cod_c ID_O Stage
201706 1 ACA_00 1
201706 1 ACB_01 2
201706 2 ACC_07 3
What I need is a list like this
Name | Sum (val1 + val2) where stage =1 | Sum (val1 + val2) where stage =2 |
ABC | x | x
DDD | x | x
CCC | x | x
This list should be accomplished by only entering Nip_Grupo (which connects the companies in table CE) And AnoMes which is a time code (yearmonth) reference.
Then the second table (O) has operations with intervenients and I'm looking for the Nip_1 to be the same as nip on CE and then link each operation from that company with the stage in OS so that I can sum the total values of operation, per company(CE) from a group, per stage.
It seems pretty straight forward, but I don't always have records on the table OS that link a stage to an entry in table O, at that point, I needed the result to show zero.
This is my Query so far ( a simplified version to fit my example):
SELECT CE.Name, (Sum([O].[val1])+Sum(val2))
FROM CE
INNER JOIN O ON (CE.Cod_Contraparte = O.Cod_Contraparte) AND
(CE.AnoMes = O.AnoMes) AND (CE.Nip = O.Nip_1Titular))
LEFT JOIN OPERACOES_STAGING_lnk AS OS ON (O.AnoMes = OS.AnoMes) AND
(O.ID_Operacao = OS.ID_Operacao)
WHERE (((CE.Nip_Grupo)=[enter nip:]) AND ((CE.anomes)=[enter anomes:]) AND
((CE.Nip)=[O].[Nip_1])) AND ((OS.Stage)=[2])
GROUP BY CE.Nome
ORDER BY CE.Nome
And this query returns only the sum when the stage is 2, and only if I have records on the table OS, as I have many operations that are not connected through the stage I need it to show zero and to print a full list of companies based on the group_id (Nip_Grupo)
Conditional aggregate may help
SELECT CE.name,
SUM( IIF( OS.stage=1, O.val1+O.val2,0)) as stage1,
SUM( IIF( OS.stage=2, O.val1+O.val2,0)) as stage2
FROM CE
INNER JOIN O ON (CE.Cod_Contraparte = O.Cod_Contraparte) AND (CE.AnoMes = O.AnoMes) AND (CE.Nip = O.Nip_1Titular)) AND ((CE.Nip)=[O].[Nip_1]))
INNER JOIN OPERACOES_STAGING_lnk AS OS ON (O.AnoMes = OS.AnoMes) AND (O.ID_Operacao = OS.ID_Operacao)
WHERE (((CE.Nip_Grupo)=[enter nip:]) AND ((CE.anomes)=[enter anomes:])
GROUP BY CE.Nome
Related
I am pulling my hair out over a data retrieval function I'm trying to write. In essence this query is meant to SUM up the count of all voorwerpnummers in the Voorwerp_in_Rubriek table, grouped by their rubrieknummer gathered from Rubriek.
After that I want to keep looping through the sum in order to get to their 'top level parent'. Rubriek has a foreign key reference to itself with a 'hoofdrubriek', this would be easier seen as it's parent in a category tree.
This also means they can be nested. A value of 'NULL' in the hoofdcategory column means that it is a top-level parent. The idea behind this query is to SUM up the count of voorwerpnummers in Voorwerp_in_rubriek, and add them together until they are at their 'top level parent'.
As the database and testdata is quite massive I've decided not to add direct code to this question but a link to a dbfiddle instead so there's more structure.
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=8068a52da6a29afffe6dc793398f0998
I got it working in some degree using this query:
SELECT R2.hoofdrubriek ,
COUNT(Vr.rubrieknummer) AS aantal
FROM Rubriek R1
RIGHT OUTER JOIN Rubriek R2 ON R1.rubrieknummer = R2.hoofdrubriek
INNER JOIN Voorwerp_in_rubriek Vr ON R2.rubrieknummer = Vr.rubrieknummer
WHERE NOT EXISTS ( SELECT *
FROM Rubriek
WHERE hoofdrubriek = R2.rubrieknummer )
AND R1.hoofdrubriek IS NOT NULL
GROUP BY Vr.rubrieknummer ,
R2.hoofdrubriek
But that doesn't get back all items and flops in general. I hope someone can help me.
If I got it right
declare #t table (
rubrieknummer int,
cnt int);
INSERT #t(rubrieknummer, cnt)
SELECT R.rubrieknummer, COUNT(Vr.voorwerpnummer)
FROM Rubriek R
INNER JOIN voorwerp_in_rubriek Vr ON R.rubrieknummer = Vr.rubrieknummer
GROUP BY Vr.rubrieknummer, R.rubrieknummer;
--select * from #t;
with t as(
select rubrieknummer, cnt
from #t
union all
select r.hoofdrubriek, cnt
from t
join Rubriek r on t.rubrieknummer = r.rubrieknummer
)
select rubrieknummer, sum(cnt) cnt
from t
group by rubrieknummer;
applying to your fiddle data returns
rubrieknummer cnt
<null> 42
100 42
101 26
102 6
103 10
10000 8
10100 4
10101 1
10102 3
10500 4
10501 2
10502 2
15000 18
15100 6
15101 2
15102 2
15103 2
15500 12
15501 4
15502 3
15503 5
20000 6
20001 2
20002 1
20003 1
20004 2
25000 4
25001 1
25002 1
25003 1
25004 1
30001 2
30002 1
30004 3
I have a data set of 300,000 rows, looking at harvested acreage in the United States. Some, but not all of my data is double counted and I am trying to remove the double counting. The data looks like this:
Year | State | Crop | Practice | Acres Harvested | Acres
-------------------------------------------------------------
2008 1 1 1 1000 or more 40
2008 1 1 1 1000 to 1999 10
2008 1 1 1 2000 to 2999 30
2008 2 1 1 1000 or more 87
2008 3 2 2 1.0 to 14.9 15
2008 3 2 2 1.0 to 4.9 5
2008 3 2 2 5.0 to 14.9 10
Some of the rows are subsets for other rows in the [Acres Harvested] column (rows 2 and 3 are a subset of row 1 and rows 6 and 7 are a subset of row 5). In situations where I have more detailed information for [Acres Harvested] (rows 2 and 3 provide more detail than row 1), I would like to keep the detailed information (row 2 and 3) and omit the general information (row 1). In other scenarios, I only have the general information (row 4), so that is what I will keep.
I am having trouble writing the code to omit the general information when the detailed information is present, but to keep the general information when the more detailed information does not exist.
I've been trying to write an "inner join" to join my table back with itself, but am unsure of how to omit rows when certain conditions are met. What I have:
SELECT *
FROM A
INNER JOIN (SELECT *
FROM A
GROUP BY [YEAR], [STATE], [CROP], [PRACTICE]
HAVING COUNT (*) > 1) AS B
ON A.Year = B.Year
AND A.State = B.State
AND A.Crop = B.Crop
AND A.Practice = B.Practice
And now I'm stuck...
Results should look like:
Year | State | Crop | Practice | Acres Harvested | Acres
-------------------------------------------------------------
2008 1 1 1 1000 to 1999 10
2008 1 1 1 2000 to 2999 30
2008 2 1 1 1000 or more 87
2008 3 2 2 1.0 to 4.9 5
2008 3 2 2 5.0 to 14.9 10
Appreciate any help!
Your question is a bit vague. This will return the result set you've specified for the input data you've specified:
select a.*
from a
where a.acres_harvested not like '% or more' or
not exists (select 1
from a a2
where a2.year = a.year and a2.state = a.state and a2.crop = a.crop and
a2.acres_harvested like '[0-9]%to%[0-9]'
);
Assuming your criteria for "more detailed information" is records for a matched set that don't end in "or more" as I guessed in my comment, you can get your desired output this way. You do the records sets with only one record and those with multiple records separately and UNION them instead of trying to do it with one SELECT.
SELECT A.*
FROM A
GROUP BY [YEAR], [STATE], [CROP], [PRACTICE]
HAVING
COUNT (*) = 1
UNION
SELECT A.*
FROM A
INNER JOIN
(SELECT [YEAR], [STATE], [CROP], [PRACTICE]
FROM A
GROUP BY [YEAR], [STATE], [CROP], [PRACTICE]
HAVING
COUNT (*) > 1
) AS B
ON A.[Year] = B.[Year]
AND A.[State] = B.[State]
AND A.[Crop] = B.[Crop]
AND A.[Practice] = B.[Practice]
WHERE [ACRES HARVESTED] not like '%%or more'
If your criteria aren't what I guess just change the WHERE clause.
Given your updated sample data you're also going to have to check for overlapping number ranges. This question has some options on how to do that: Discard existing dates that are included in the result, SQL Server. You'll need to split your "X to Y" values into two numeric fields as well.
I am using the query below to pull up a list of accounts and the optional codes that go with them. There are 95 codes for each account, not just the 2 I am showing in the results below.
SELECT DISTINCT Ref1.ACCOUNT_ID as Acct_Numb,
Current_Date as DATA_DATE,
Cat.OPTIONAL_CTGRY_CD As Code,
Cat.OPTIONAL_CTGRY_CD || ' - ' || Cat.OPTIONAL_CTGRY_NM AS Code_Combo,
Class.OPTIONAL_CLASS_CD as Code_Answer,
Class.OPTIONAL_CLASS_NM as Code_Answer_Desc
FROM xxxxx.zzzzzz_OPT_REF Ref1
LEFT JOIN xxxxx.zzzzzz_OPT_CATEGORY Cat
ON xxxxx.zzzzzz_OPT_REF.OPTIONAL_CTGRY_CD = xxxxx.zzzzzz_OPT_CATEGORY.OPTIONAL_CTGRY_CD
LEFT JOIN xxxxx.zzzzzz_OPT_CLASS Class
ON xxxxx.zzzzzz_OPT_REF.OPTIONAL_CLASS_CD = xxxxx.zzzzzz_OPT_CLASS.OPTIONAL_CLASS_CD
AND xxxxx.zzzzzz_OPT_CATEGORY.OPTIONAL_CTGRY_CD = xxxxx.zzzzzz_OPT_CLASS.OPTIONAL_CTGRY_CD
LEFT JOIN xxxxx.HRTVACT_PCS Acct
ON xxxxx.ACCOUNT_ID = Acct.ACCOUNTID
WHERE xxxxx.ACCOUNTSTATUS = 'OPEN' AND xxxxx.ACCOUNTID = '123456' OR xxxxx.ACCOUNTID = '654321'
ORDER BY ACCT_NUMB ASC, CODE ASC;
Here are the results
DATA_DATE ACCT_NUMB CODE CODE_COMBO CODE_ANSWER CODE_ANSWER_DESC
11/8/2016 123456 1 1 - Reporting 0 NOT APPLICABLE
11/8/2016 123456 2 2 - System 4 SYSTEM 2
11/8/2016 654321 1 1 - Reporting 3 APPLIED
11/8/2016 654321 2 2 - System 3 N/A
I need to create the results as a pivot table that looks like the table below.
(CODE) (CODE_COMBO) (CODE) (CODE_COMBO)
DATA_DATE ACCT_NUMB 1 1 - Reporting 2 2 - System
11/8/2016 123456 0 NOT APPLICABLE 4 SYSTEM 2 (CODE_ANSWER)/(CODE_ANSWER_DESC)
11/8/2016 654321 3 APPLIED 3 N/A (CODE_ANSWER)/(CODE_ANSWER_DESC)
I have not tried this before and I am stumped
I have accomplished this in the past using table alias so that you can join a table to itself. I have done it with 4 or 5 columns not 95 so there may be a better way that I am not aware of.
I just need advice on how I could speed up my code. I'm supposed to count on yearly base, how the grades of some students are improving and calculate in percentage. Also keep in mind that I have around 100k-150k records per year.
Basically end results look like this, so at end of 20150131, 2% of students had grade A finished with grade B and so on.
Grade Date B C
A 20150131 2% 3%
B 20150131 88% 85%
C 20150131 10% 12%
A 20140131 2% 3%
B 20140131 88% 85%
C 20140131 10% 12%
A 20130131 2% 3%
B 20130131 88% 85%
C 20130131 10% 12%
Input looks like this .. just info about student and his grade on certain date
Student Date Grade
1 20150131 A
2 20150131 C
3 20150131 A
1 20140131 B
2 20140131 B
3 20140131 A
My code looks like this:
WHILE #StartDateInt > #PeriodSpan
BEGIN
while #y <= #CategoriesCount
BEGIN
set #CurrentGr = (Select Grade from #Categories where RowID = #y)
set #CurrentGrCount = (Select COUNT(Students) from #TempTable where Period = #PeriodSpan and Grade = #CurrentGr)
set #DefaultCurrentGr = (Select Grade from #Categories where RowID = #y)
insert into Grade_MTRX (Student, Period, Grades_B, SessionID)
select temp1.Grade, #PeriodNextSpan as Period, COUNT(Grades_B)/#CurrentGrCount as 'Grades_B', #SessionID
from #TempTable temp1
join #TempTable temp2 on temp1.Student = temp2.Student and temp1.Period + 10000 = temp2.Period
where temp1.Grade = #CurrentGr and temp2.Grade = 'C' and temp1.Period = #PeriodSpan
group by temp1.Grade, temp1.Period
update Grade_MTRX set Grades_C = (
select COUNT(Grades_C)/#CurrentGrCount
from #TempTable
where Grade = 'C' and Period = #PeriodNextSpan)
where Category = #CurrentGr and Period = #PeriodNextSpan
end
end
I understand SQL Server doesn't like while loops, as I understand it kills it's performance... But I'm using while inside of while loop... going over years, for each grade and just counting them and... first I insert 1 row of current grade, and then I keep updating that row until its fully populated.
I do understand this is really bad, but at the end that's why I am here to learn better way to accomplish this.
Thank you in advance!
150,000 records per year is really nothing. Let's say you had this Grade table:
CREATE TABLE Grade(
student_id INT,
date INT,
grade CHAR);
With this info:
student_id date grade
1 2013 A
1 2014 A
1 2015 B
2 2013 B
2 2014 A
2 2015 C
3 2013 C
3 2014 A
3 2015 B
Then if you just run a query like:
SELECT this_year.date, last_year.grade AS last_year, this_year.grade AS this_year, COUNT(*) AS total,
(100.0 * COUNT(*)) / (SELECT COUNT(*) FROM Grade WHERE date = this_year.date) AS percent
FROM Grade AS this_year
INNER JOIN Grade AS last_year ON this_year.date = last_year.date + 1
AND this_year.student_id = last_year.student_id
GROUP BY this_year.date, this_year.grade, last_year.grade
ORDER BY 1, 2, 3;
you end up with these results:
date | last_year | this_year | total | percent
------+-----------+-----------+-------+---------------------
2014 | A | A | 1 | 33.3333333333333333
2014 | B | A | 1 | 33.3333333333333333
2014 | C | A | 1 | 33.3333333333333333
2015 | A | B | 2 | 66.6666666666666667
2015 | A | C | 1 | 33.3333333333333333
(5 rows)
Having a few million rows of data with this kind of query shouldn't be any real trouble. Even tens of millions of rows. But if you need things to be faster still then check out windowing functions that you can do with Postgres, Oracle, and MSSQL server.
I have a table containing some payments looking something like this:
id | from | to | amount
--------------------------
1 | 125 | 135 | 2.4
2 | 123 | 134 | 1.7
3 | 124 | 138 | 4.8
4 | 118 | 119 | 3.9
5 | 56 | 254 | 23.5
...
I need to know if there is a way to make SQL query that would tell me if there is a series of consecutive rows, the amount of which sums up to a certain value. For example, if I wanted value 6.5, it would return rows 2 to 3. If I wanted 12.8, it would return rows 1 to 4 and so on.
I am absolutely stuck and would appreciate some help.
I would approach this as follows. First, calculate the cumulative sum. Then, the condition that consecutive rows have a particular sum is equivalent to saying that the difference between two of the cumulative sums equals that value.
with p as (
select p.*, sum(amount) over (order by id) as cumamount
from payments p
)
select
from p p1 join
p p2
on p1.id <= p2.id and
( p2.cumamount - p1.cumamount ) = 6.5;
As a note: this will probably not work if amount is stored as a floating point number because of very small inaccuracies. If amount where an integer, it would be fine, but it clearly is not. A fixed point representation should be ok.
;with numbers as (select number from master..spt_values where type='p' and number between 1 and (Select MAX(id) from yourtable)),
ranges as ( select n1.number as start, n2.number as finish from numbers n1 cross join numbers n2 where n1.number<=n2.number)
select yourtable.* from yourtable
inner join
(
select start, finish
from ranges
inner join yourtable on id between start and finish
group by start, finish
having SUM(amount)=12.8
) results
on yourtable.id between start and finish