I need a moving sum that starts from the current row until X following rows. The problem is that X is not static (i.e. it comes from another column).
The code:
sum(column 0) OVER(
ORDER BY column 1, column 2, column 3
ROWS BETWEEN CURRENT ROW AND column 4 FOLLOWING
) as X
column 4 is already an integer, but the SQL "complains" asking for a hard coded integer. Casting and converting didn't work either.
Thank you in advance!
Usually SQL engines cannot handle "variables" on some parts of the SQL statement.
This is one of those cases. Unfortunately, a non-literal is not accepted at this place. You'll need to rephrase your query using a different strategy.
Related
Here I have this query that finds out the drop percentage of a bunch of clients based on the orders they have received(i.e. It finds the percentage difference in orders by comparing the current month with the previous month). What I want to achieve here is to have a field where I can see the clients who had 4 months continuous drop, 3 months drop, 2 months drop, and 1 month drop.
I know, it can only be achieved by comparing the last 4 months using the lag function or sub queries. can you guys pls help me out on this one, would appreciate it very much
select
fd.customers2, fd.Month1, fd.year1, fd.variance, case when
(fd.variance < -0.00001 and fd.year1 = '2022.0' and fd.Month1 = '1')
then '1month drop' else fd.customers2 end as 1_most_host_drop
from
(SELECT
c.*,
sa.customers as customers2,
sum(sa.order) as orders,
date_part(mon, sa.date) as Month1,
date_part(year, sa.date) as year1,
(cast(orders - LAG(orders) OVER(Partition by customers2 ORDER BY
year1, Month1) as NUMERIC(10,2))/NULLIF(LAG(orders)
OVER(partition by customers2 ORDER BY year1, Month1) * 1, 0)) AS variance
FROM stats sa join (select distinct
d.id, d.customers
from configer d
) c on sa.customers=c.customers
WHERE sa.date >= '2021-04-1'
GROUP BY Month1, sa.customers, c.id, year1,
c.customers)fd
In a spirit of friendliness: I think you are a little premature in posting this here as there are several issues with the syntax before even reaching the point where you can solve the problem:
You have at least two places with a comma immediately preceding the word FROM:
...AS variance, FROM stats_archive sa ...
...d.customers, FROM config d...
Recommend you don't use VARIANCE as an alias (it is a system function in PostgreSQL and so is likely also a system function name in Redshift)
Not super important, but there's no need for c.* - just select the columns you will use
DATE_PART requires a string as the first parameter DATE_PART('mon',current_date)
I might be wrong about this, but I suspect you cannot use column aliases in the partition by or order by of a window function. Put the originating expressions there instead:
... OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
LAG has three parameters. (1) The column you want to retrieve the value from, (2) the row offset, where a positive integer indicates how many rows prior to the current row you should retrieve a value from according to the partition and order context and (3) the value the function should return as a default (in case of the first row in the partition). As such, you don't need NULLIF. So, to get the row immediately prior to the current row, or return 0 in case the current row is the first row in the partition:
LAG(orders,1,0) OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
If you use 0 as a default in the calculation of what is currently aliased variance, you will almost certainly run into a div/0 error either now, or worse, when you least expect it in the future. You should protect against that with some CASE logic or better, provide a more appropriate default value or even better, calculate the LAG with the default 0, then filter out the 0 rows before doing the calculation.
You can't use column aliases in the GROUP BY. You must reference each field that is not participating in an aggregate in the group by, whether through direct mention (sa.date) or indirectly in an expression (DATE_PART('mon',sa.date))
Your date should be '2021-04-01'
All in all, without sample data, expected results using the posted sample data and without first removing syntax errors, it is a tall order to attempt to offer advice on the problem which is any more specific than:
Build the source of the calculation as a completely separate query first. Calculate the LAG in that source query. Only when you've run that source query and verified that the LAG is producing the correct result should you then wrap it as a sub-query or CTE (not sure if Redshift supports these, but presumably) at which point you can filter out the rows with a zero as the denominator (the first month of orders for each customer).
Good luck!
I need to compare current row and previous row and based on some comparision need to derive a column value. Currently apporach I m following is making two differnt record sets and then use rank function and then by joining rank functions I m able to achiieve this. However, this seems to be tedious apporach, is there a better way to achieve this. I m currently writing query something like below :-
select
< comparing columns from two record sets and deriving column value>
(
select(<some complex logic>, rank from a) rcdset,
(select <some complex logic>, rank +1 from a) rcdset2 where rcdset.rnk = rcdset1.rnk (+)
Database - Oracle 10g
Use LAG(value_expr) OVER (ORDER BY rank_col) to retrieve the value (value_expr) from previous row (order defined by rank_col), see http://oracle-base.com/articles/misc/lag-lead-analytic-functions.php
I have a column with data of the following structure:
aaa5644988
aaa4898494
aaa5642185
aaa5482312
aaa4648848
I have a range that can be anything, like 100-30000 or example. I want to have all values that end in numbers between that range.
I tried
like '%[100-30000]'
but this doesn't work apparently.
I have seen a lot of similar questions but none of the solved my problem
edit I'm using SQL server 2008
Example:
Value
aaa45645695
aaa28568720
aaa65818450
8789212
6566700
For the range 600-1200, I want to retrieve row 1,2,5 because they end with the range.
In SQL, like normally only support % and _ these two operators. That's why like '%[100-30000]' doesn't work.
Depend on your use case, there could be two solutions for this problem:
If you only need to do this query two or three times(didn't care how long it takes), or the dataset is not very big. You can select all the data from this column, and then do the filtering in another programming language.
Take ruby for example, you can do:
column_data = #connection.execute("select * from your_column_name")
result = column_data.map{|x| x.gsub(/^.*[^\d]/, '').to_i }.select{|x| x > 100 && x < 30000}
If you need to do this query regularly, I'd suggest you add a new column to this data table with only the numbers in the current column, so as to get a much better performance in querying speed.
SELECT *
FROM your_table
WHERE number_column BETWEEN 100 AND 30000
I have an SQL table with geo-tagged values (Longitude, Latitude, value). The table is accumulated quickly and has thousands entries. Therefore, querying the table for values in some area return very large data-set.
I would like to know the way to average value with close location proximity to one value, here is an illustration:
Table:
Long lat value
10.123001 53.567001 10
10.123002 53.567002 12
10.123003 53.567003 18
10.124003 53.568003 13
lets say my current location is 10.123004, 53.567004. If I am querying for the values near by I will get the four raws with values 10, 12, 18, and 13. This works if the data-set is relatively small. If the data is large I would like to query sql for rounded location (10.123, 53.567) and need sql to return something like
Long lat value
10.123 53.567 10 (this is the average of 10, 12, and 18)
10.124 53.568 13
Is this possible? how we can average large data set based on locations?
Is sql database is the right choice in the first place?
GROUP BY rounded columns, and the AVG aggregate function should work fine for this:
SELECT ROUND(Long, 3) Long,
ROUND(Lat, 3) Lat,
AVG(value)
FROM Table
GROUP BY ROUND(Long, 3), ROUND(Lat, 3)
Add a WHERE clause to filter as needed.
Here's some rough pseudocode that might be a start. You need to provide the proper precision arguments for the round function in the dialect of SQL you are using for your project, so understand that the 3 I provide as the second argument to Round is the number of decimals of precision to which the number is rounded, as indicated by your original post.
Select round(lat,3),round(long,3),avg(value)
Group by round(lat,3),round(long,3)
The problem with the rounding approach is the boundary conditions -- what happens when points are close to the bounday.
However, for the neighborhood of a given point it is better to use something like:
select *
from table
where long between #MyLong - #DeltaLong and #MyLong + #DeltaLong and
lat between #MyLat - #DeltaLat and #MyLat + #DeltaLat
For this, you need to define #DeltaLong and #DeltaLat.
Rounding works fine for summarization, if that is your problem.
I'd like to consult one thing. I have table in DB. It has 2 columns and looks like this:
Name...bilance
Jane...+3
Jane...-5
Jane...0
Jane...-8
Jane...-2
Paul...-1
Paul...2
Paul....9
Paul...1
...
I have to walk through this table and if I find record with different "name" (than was on previous row) I process all rows with the previous "name". (If I step on the first Paul row I process all Jane rows)
The processing goes like this:
Now I work only with Jane records and walk through them one by one. On each record I stop and compare it with all previous Jane rows one by one.
The task is to sumarize "bilance" column (in the scope of actual person) if they have different signs
Summary:
I loop through this table in 3 levels paralelly (nested loops)
1st level = search for changes of "name" column
2nd level = if change was found, get all rows with previous "name" and walk through them
3rd level = on each row stop and walk through all previous rows with current "name"
Can this be solved only using CURSOR and FETCHING, or is there some smoother solution?
My real table has 30 000 rows and 1500 people and If I do the logic in PHP, it takes long minutes and than timeouts. So I would like to rewrite it to MS SQL 2000 (no other DB is allowed). Are cursors fast solution or is it better to use something else?
Thank you for your opinions.
UPDATE:
There are lots of questions about my "summarization". Problem is a little bit more difficult than I explained. I simplified it just to describe my algorithm.
Each row of my table contains much more columns. The most important is month. That's why there are more rows for each person. Each is for different month.
"Bilances" are "working overtimes" and "arrear hours" of workers. And I need to sumarize + and - bilances to neutralize them using values from previous months. I want to have as many zeroes as possible. All the table must stay as it is, just bilances must be changed to zeroes.
Example:
Row (Jane -5) will be summarized with row (Jane +3). Instead of 3 I will get 0 and instead of -5 I will get -2. Because I used this -5 to reduce +3.
Next row (Jane 0) won't be affected
Next row (Jane -8) can not be used, because all previous bilances are negative
etc.
You can sum all the values per name using a single SQL statement:
select
name,
sum(bilance) as bilance_sum
from
my_table
group by
name
order by
name
On the face of it, it sounds like this should do what you want:
select Name, sum(bilance)
from table
group by Name
order by Name
If not, you might need to elaborate on how the Names are sorted and what you mean by "summarize".
I'm not sure what you mean by this line... "The task is to sumarize "bilance" column (in the scope of actual person) if they have different signs".
But, it may be possible to use a group by query to get a lot of what you need.
select name, case when bilance < 0 then 'negative' when bilance >= 0 then 'positive', count(*)
from table
group by name, bilance
That might not be perfect syntax for the case statement, but it should get you really close.