Progress query to remove duplicates based on number of duplicates

Progress query to remove duplicates based on number of duplicates - sql

Our accounting department needs pull tax data from our MIS every month and submit it online to the Dept. of Revenue. Unfortunately, when pulling the data, it is duplicated a varying number of times depending on which jurisdictions we have to pay taxes to. All she needs is the dollar amount for one jurisdiction, for one line, because she enters that on the website.
I've tried using DISTINCT to pull only one record of the type, in conjunction with LEFT() to pull just the first 7 characters of the jurisdiction but it ended up excluding certain results that should have been included. I believe it was because the posting date and the amount on a couple transactions was identical. They were separate transactions but the query took them as duplicates and ignored them.
Here is a couple of examples of queries I've run that have been successful in pulling most of the data, but most times either too much or not enough:
SELECT DISTINCT LEFT("Sales-Tax-Jurisdiction-Code", 7), "Taxable-Base", "Posting-Date"
FROM ARInvoiceTax
WHERE ("Posting-Date" >= '2019-09-01' AND "Posting-Date" <= '2019-09-30')
AND (("Sales-Tax-Jurisdiction-Code" BETWEEN '55001' AND '56763')
OR "Sales-Tax-Jurisdiction-Code" = 'Dakota Cty TT')
ORDER BY "Sales-Tax-Jurisdiction-Code"
Here is a query that I can to pull all of the data and the subsequent result is below that:
SELECT "Sales-Tax-Jurisdiction-Code", "Taxable-Base", "Posting-Date"
FROM ARInvoiceTax
WHERE ("Posting-Date" >= '2019-09-01' AND "Posting-Date" <= '2019-09-30')
AND (("Sales-Tax-Jurisdiction-Code" BETWEEN '55001' AND '56763')
OR "Sales-Tax-Jurisdiction-Code" = 'Dakota Cty TT')
ORDER BY "Sales-Tax-Jurisdiction-Code"
Below is a sample of the output:
Jurisdiction | Tax Amount | Posting Date
-------------|------------|-------------
5512100City | $50.00 | 2019-09-02
5512100City | $50.00 | 2019-09-03
5512100City | $70.00 | 2019-09-02
5512100Cnty | $50.00 | 2019-09-02
5512100Cnty | $50.00 | 2019-09-03
5512100Cnty | $70.00 | 2019-09-02
5512100State | $70.00 | 2019-09-02
5512100State | $50.00 | 2019-09-02
5512100State | $50.00 | 2019-09-03
5513100Cnty | $25.00 | 2019-09-12
5513100State | $25.00 | 2019-09-12
5514100City | $9.00 | 2019-09-06
5514100City | $9.00 | 2019-09-06
5514100Cnty | $9.00 | 2019-09-06
5514100Cnty | $9.00 | 2019-09-06
5515100State | $12.00 | 2019-09-11
5516100City | $6.00 | 2019-09-13
5516100City | $7.00 | 2019-09-13
5516100State | $6.00 | 2019-09-13
5516100State | $7.00 | 2019-09-13
As you can see, the data can be all over the place. One zip code could have multiple different lines. What the accounting department does now is prints a report with this information and, in a spreadsheet, only records (1) dollar amount per transaction. For example, for 55121, she would need to record $50.00, $50.00 and $70.00 (she tallies them and adds the total amount on the website) however the SQL query gives me those (3) numbers, (3) times.
I can't seem to figure out a query that will pull only one set of the data. Unfortunately, I can't do it based on the words/letters after the 00 because not all jurisdictions have all 3 (city, cnty, state) and thus trying to remove lines based on that removes valid lines as well.

Can you use select distinct? If the first five characters are the zip code and you just want that:
select distinct left(jurisdiction, 5), tax_amount
from t;

Take only City/County/.. whatever is first
select jurisdiction, tax_amount, Posting_Date
from (
select *, dense_rank() over(partition by left(jurisdiction, 7) order by substring(jurisdiction, 8, len(jurisdiction))) rnk
from taxes -- you output here
)
where rnk=1;
Sql server syntax, you may need other string functions in your dbms.
Postgresql fiddle

Related

SQL lite query - Merging time periods

I am working with a SQLite RDB and have the following problem.
PID EID EPISODETYPE START_TIME END_TIME
123 556 emergency_room 2020-03-29 15:09:00 2020-03-30 20:36:00
123 558 ward 2020-04-30 20:35:00 2020-05-04 22:12:00
123 660 ward 2020-05-04 22:12:00 2020-05-21 08:59:00
123 661 icu 2020-05-21 09:00:00 2020-07-01 17:00:00
Basically, PID represents each patient unique identifier. They all have an episode identifier for all the different beds they occupy during a unique stay.
What I wish to accomplish is to select all episodes from a single hospital stay and return it as the stay number.
I would want my query to result in this :
PID EID StayNumber
123 556 1
123 558 2
123 660 2
123 661 2
1 st row is StayNumber as it's the first.
As the 2nd, 3rd and 4th row are from the same hospital stay (we can tell by the overlapping OR relatively close start and end time period) they are all labeled StayNumber 2.
A hospital stay is defined as the period of time during which the patient never left the hospital.
I tried to write the query by starting off with a :
GROUP BY PID (to isolate the process for each individual patient)
Using datetime to compute a simple time difference rule but I have trouble writing down a query using the end time from a row and the start time from the next row.
Thank you in advance.
I am a SQL learner
UPDATE ***

Use window function LAG() to flag the groups for each hospital stay and window function SUM() to get the numbers:
SELECT PID, EID,
SUM(flag) OVER (PARTITION BY PID ORDER BY START_TIME) StayNumber
FROM (
SELECT *,
strftime('%s', START_TIME) -
strftime('%s', LAG(END_TIME, 1, datetime(START_TIME, '-1 hour')) OVER (PARTITION BY PID ORDER BY START_TIME)) > 60 flag
FROM tablename
)
See the demo.
Results:
|PID | EID | StayNumber
|:-- | :-- | ---------:
|123 | 556 | 1
|123 | 558 | 2
|123 | 660 | 2
|123 | 661 | 2

Payment fully received

We are raising bills to clients on various dates, and the payment is received in a irregular way. We need to calculate the payment delay days till full payments are received for a particular payment due. The data is sample data for only one client 0123
table Due (id, fil varchar(12), amount numeric(10, 2), date DATE)
table Received (id, fil varchar(12), amount numeric(10, 2), date DATE)
Table Due:
id fil amount date
----------------------------------------
1. 0123 1000. 2019-jan-01
2. 0123 1500 2019-jan-15
3. 0123 1200. 2019-jan-25
4. 0123 1800. 2019-feb-10
Table Received:
id. fil. amount. date
-----------------------------------------
1. 0123 1000. 2019-jan-10
2. 0123 500. 2019-jan-20
3. 0123 1300. 2019-jan-25
4. 0123 400. 2019-feb-08
5. 0123 1000. 2019-feb-20
The joined table should show:
fil. due_date due_amount. received_amount date delay
------------------------------------------------------------------------
0123 2019-jan-01 1000. 1000 9
0123 2019-jan-15. 1500. 500
0123 1300. 10(since payment completed on 25th jan)
0123 2019-jan-25. 1200. 400.
0123 1000. 26
0123 2019-feb-10. 1800.
I have tried to be as accurate as possible in calculations......Please excuse if there is some in advertant error. I was just coming around to writing a script to do this, but maybe someone will be able to suggest a proper join.
Thanks for trying..

As #DavidHempy said, this is not possible without knowing for which invoice each payment is meant. You can calculate how many days it's been since the account was at 0, which might help:
with all_activity as (
select due.date,
-1 * amount as amount
from due
union all
select received.date,
amount
from received),
totals as (
select date,
amount,
sum(amount) over (order by date),
case when sum(amount) over (order by date) >=0
then true
else false
end as nothing_owed
from all_activity)
select date,
amount,
sum,
date - max(date) filter (where nothing_owed = true) OVER (order by date)
as days_since_positive
from totals order by 1
;
date | amount | sum | days_since_positive
------------+----------+----------+---------------------
2019-01-01 | -1000.00 | -1000.00 |
2019-01-10 | 1000.00 | 0.00 | 0
2019-01-15 | -1500.00 | -1500.00 | 5
2019-01-20 | 500.00 | -1000.00 | 10
2019-01-25 | -1200.00 | -900.00 | 15
2019-01-25 | 1300.00 | -900.00 | 15
2019-02-08 | 400.00 | -500.00 | 29
2019-02-10 | -1800.00 | -2300.00 | 31
2019-02-20 | 1000.00 | -1300.00 | 41
(9 rows)
You could extend this logic to figure out the last due date from which they were above 0.

Subtract two aggregated values in Bar Chart

My data is like -
+-----------+------------------+-----------------+-------------+
| Issue Num | Created On | Closed at | Issue Owner |
+-----------+------------------+-----------------+-------------+
| 1 | 12/21/2016 15:26 | 1/13/2017 9:48 | Name 1 |
| 2 | 1/10/2017 7:38 | 1/13/2017 9:08 | Name 2 |
| 3 | 1/13/2017 8:57 | 1/13/2017 8:58 | Name 2 |
| 4 | 12/20/2016 20:30 | 1/13/2017 5:46 | Name 2 |
| 5 | 12/21/2016 19:30 | 1/13/2017 1:14 | Name 1 |
| 6 | 12/20/2016 20:30 | 1/12/2017 9:11 | Name 1 |
| 7 | 1/9/2017 17:44 | 1/12/2017 1:52 | Name 1 |
| 8 | 12/21/2016 19:36 | 1/11/2017 16:59 | Name 1 |
| 9 | 12/20/2016 19:54 | 1/11/2017 15:45 | Name 1 |
+-----------+------------------+-----------------+-------------+
What I am trying to achieve is
Number of issues created per week
Number of issues closed per week
Net number of issues remaining per week
I am able to resolve the top two points but unable to approach the last.
My attempt -
This gives me number of issues created every week.
Similarly I have done for Closed per week.
For Net number of issues (Created-Closed) -
I tried adding Closed At column along with Created On but I can't see second bar in the chart along with Created On either.
Something like this
I tried doing the same in excel -
I want something of this sort but with another column as the difference of
number of issues created that week - number of issues closed that week.
In this case, 8-6=2.

You could use a calculated field(Analysis->Create Calculated Field). Something like this:
{FIXED [Create Date]:Count(if DATEPART('year',[Create Date]) = 2016 then [Number of Records] end)} - {FIXED [Closed Date]:Count(if DATEPART('year',[Closed Date]) = 2016 then [Number of Records] end)}
This function is using LOD expressions to pull back both sets of values. It will filter on all 2016 results for both date sets and then minus them from each other.
For more on LOD's see here:
https://www.tableau.com/about/blog/LOD-expressions
Use this as your measure and pull in one of your date fields as the dimension.

The normal way to solve this problem is to reshape the data so you have one row per status change instead of one row per issue, with a column named [Date] and a column named [Action]. The action can be submit and close (or in a more complex world include approve, reject, whatever - tracking the history.
You can do the reshaping without modifying your source data by using a UNION to get two copies of each row with appropriate calculated fields to make the visible columns make sense (e.g., create calculated a field called Date that returns the submission date or closing date depending on whether the row is from the first or second union, with a similar one called Action whose value depends on that as well. Filter out Close actions that have a null date)
Or you can preprocess the data to reshape it.
Or you can use data blending to make two sources that point to the same data source but customizing the linking fields to line up the submit and close dates (e.g., duplicate the data connection and rename both date fields to have the same name). But in this case, you probably want to create scaffolding source that has every date, but no other data, to use as the primary data source to avoid filtering out data from the secondary for dates that don't appear in the primary. The blending approach can be brittle.
Assuming you used the UNION approach instead of Data Blending, then you can count the number of submissions and closures within a certain date range, or compute a running total of the difference to see the backlog size over time.

Oracle, Mysql, how to get average

How to get Average fuel consumption only using MySQL or Oracle:
SELECT te.fuelName,
zkd.fuelCapacity,
zkd.odometer
FROM ZakupKartyDrogowej zkd
JOIN TypElementu te
ON te.typElementu_Id = zkd.typElementu_Id
AND te.idFirmy = zkd.idFirmy
AND te.typElementu_Id IN (3,4,5)
WHERE zkd.idFirmy = 1054
AND zkd.kartaDrogowa_Id = 42
AND zkd.data BETWEEN to_date('2015-09-01','YYYY-MM-DD')
AND to_date('2015-09-30','YYYY-MM-DD');
Result of this query is:
fuelName | fuelCapacity | odometer | tanking
-------------------------------------------------
'ON' | 534 | 1284172 | 2015-09-29
'ON' | 571 | 1276284 | 2015-09-02
'ON' | 470 | 1277715 | 2015-09-07
'ON' | 580.01 | 1279700 | 2015-09-11
'ON' | 490 | 1281103 | 2015-09-17
'ON' | 520 | 1282690 | 2015-09-23
We can do it later in java or php, but want to get result right away from query. How should we modify above query to do that?
fuelCapacity is the number of liters of fuel that has been poured into cartank at gas station.

For one total average, what you need is the sum of the refills divided by the difference between the odometer readings at the start and the end, i.e. fuel used / distance travelled.
I don't have your table structure at hand, but this alteration to the select statement should do the trick:
select cast(sum(zkd.fuelCapacity) as float) / (max(zkd.odometer) - min(zkd.odometer)) as consumption ...
The cast(field AS float) does what the name implies, and typecasts the field as float, so the result will also be a float. (I do suspect that your fuelCapacity field is a float because there is one float value in your example, but this will make sure.)

MS Access SQL, summing a few fields and comparing the value

My table called lets say "table1" looks as follows:
Area | Owner | Numberid | Average
1200 | Fed_G | 998 | 1400
1220 | Priv | 1001 | 1600
1220 | Local_G | 1001 | 1430
1220 | Prov_G | 1001 | 1560
1220 | Priv | 1674 | 1845
1450 | Prov_G | 1874 | 1982
Ideally what I would like to do is sum a few rows in the average column if:
1. they have the same numberid (lets say three rows all had a numberid=1000 then they would have their average added together)
2.Area=1220
Then take that and append it to the existing table, while setting the Owner field equal to "ALL".
I just started working with Access so I'm not really sure how to do this, this is my horrible attempt:
SELECT ind.Area, ind.Owner, ind.numberid,
(SELECT SUM([Average]) FROM [table1]
WHERE [numberid]=ind.numberid) AS Average
FROM [table1] AS ind
WHERE (((ind.numberid)>="1000" And (ind.numberid)<"10000") AND ((ind.Area)="1220"))
Can anyone guide me through what I should do? I'm not used to sql syntax.
I tried to use "ind" as a variable to compare to.
So far it gives me column names but no output.

I'm still unsure whether I understand what you want. So I'll offer you this query and let you tell us if the result is what you want or how it differs from what you want.
SELECT
t.Area,
'ALL' AS Owner,
t.Numberid,
Sum(t.Average) AS SumOfAverage
FROM table1 AS t
GROUP BY t.Area, 'ALL', t.Numberid;
Using your sample data, that query gives me this result set.
Area Owner Numberid SumOfAverage
1200 ALL 998 1400
1220 ALL 1001 4590
1220 ALL 1674 1845
1450 ALL 1874 1982

Probably I would be able to (maybe) give you a better answer if you improve the wording of your question.
However, to <> you need to select average column and group by numberid and Area columns. Since the Owner field is <> I guess it doesn't matter in this query that I'm writing:
SELECT numberid, area, SUM(average)
FROM table1
WHERE owner = 'your-desired-owner-equal-to-all'
GROUP BY numberid, area
ORDER BY numberid, area

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Progress query to remove duplicates based on number of duplicates - sql

Can you use select distinct? If the first five characters are the zip code and you just want that: select distinct left(jurisdiction, 5), tax_amount from t;

Related

SQL lite query - Merging time periods

Payment fully received

Subtract two aggregated values in Bar Chart

Oracle, Mysql, how to get average

MS Access SQL, summing a few fields and comparing the value

Categories

Resources