I'm asking for a solution without functions or procedures (Permissions problem).
I have a table like this:
where k=number of columns (In reality : k=500)
col1 col2 col3 col4 col5.... col(k)
10 20 30 -50 60 100
and I need to create a comulative row like this:
col1 col2 col3 col4 col5 ... col(k)
10 30 60 10 70 X
In Excel, it's a simple shit to make a forumla and drag it but in sql if I have lot of columns, it seems a very clumsy work to add Manually (col1 as col1, col1+col2 as col2, col1+col2+col3 as col3 till colk etc).
Any way of finding a good solution for this problem?
You say that you've changed your data model to rows. So let's say that the new table has three columns:
grp (some group key to identify which rows belong together, i.e. what was one row in your old table)
pos (a position number from 1 to 500 to indicate the order of the values)
value
You get the cumulative sums with SUM OVER:
select grp, pos, value, sum(value) over (partition by grp order by pos) as running_total
from mytable
order by grp, pos;
If this "colk" is going to be needed/used in a lot of reports, I suggest you create a computed column or a view to sum all the columns using k = cola+colb+...
There's no function in sql to sum up columns (ex. between colA and colJ)
Related
I was trying with cross join and unnest , but I only managed to split one column, not all three at the same time
I have this table in amazon athena
and I want to separate the columns with lists into rows, leaving a table like this
COL1
COL2
COL3
COL4
COL5
COL6
765045
5782
jd938
1
a
pickup
765045
5782
jd938
2
b
delivery
41118
78995
kd982
5
g
pickup
41118
78995
kd982
8
q
delivery
411620
65852
km0899
9
k
pickup
411620
65852
km0899
6
b
delivery
select
t.COL1, t.COL2,t.COL3, u.COL4
from t
cross join
unnest(t.COL4) u(COL4)
I was thinking of making subtables and repeating this code 3 times but I wanted to know if there is a more efficient way
unnest supports handling multiple columns in one statement. Also you can use succinct syntax omitting the CROSS JOIN:
select
t.COL1, t.COL2,t.COL3, u.COL4, u.COL5, u.COL6
from t,
unnest(t.COL4, t.COL5, t.COL6) AS u(COL4, COL5, COL6)
Note that for array of different cardinality it will substitute missing values with null's. And if all arrays are empty the row will not be added to the final result (but you can work around this by adding a dummy array with one element like was done here).
I'm storing some realtime data in SQLite. Now, I want to remove duplicate records to reduce data and enlarge its timeframe to 20 seconds by SQL commands.
Sample data:
id t col1 col2
-----------------------------
23 9:19:18 15 16
24 9:19:20 10 11
25 9:19:20 10 11
26 9:19:35 10 11
27 9:19:45 10 11
28 9:19:53 10 11
29 9:19:58 14 13
Logic: In above sample, records 25-28 have same value in col1 and col2 field, so they are duplicate. But because keeping one (for example, record 25) and removing others will cause timeframe (= time difference between subsequent data) to be more than 20s, i don't want to remove all of records 26-28. So, in above sample, row=25 will be kept because, it's not duplicate of its previous row. Row=26 will be kept, because although its duplicate of its previous row, removing this row causes to have timeframe to more than 20s (19:45 - 19:20). Row=27 will be removed, meeting these 2 conditions and row=28 will be kept.
I can load data to C# datatable and apply this logic in code in a loop over records, but it is slow comparing to run SQL in database. I'm not sure this can be implemented in SQL. Any help would be greatly appreciated.
Edit: I've added another row before row =25 to show rows with the same time. Fiddle is here: Link
OK so here's an alternate answer that handles the duplicate record scenario you've described, uses LAG and LEAD and also ends up considerably simpler as it turns out!
delete from t1 where id in
(
with cte as (
select id,
lag(t, 1) over(partition by col1, col2 order by t) as prev_t,
lead(t, 1) over(partition by col1, col2 order by t) as next_t
from t1
)
select id
from cte
where strftime('%H:%M:%S',next_t,'-20 seconds') < strftime('%H:%M:%S',prev_t)
)
Online demo here
I believe this accomplishes what you are after:
delete from t1 where id in
(
select ta.id
from t1 as ta
join t1 as tb
on tb.t = (select max(t) from t1 where t < ta.t
and col1 = ta.col1 and col2 = ta.col2)
and tb.col1 = ta.col1 and tb.col2 = ta.col2
join t1 as tc
on tc.t = (select min(t) from t1 where t > ta.t
and col1 = ta.col1 and col2 = ta.col2)
and tc.col1 = ta.col1 and tc.col2 = ta.col2
where strftime('%H:%M:%S',tc.t,'-20 seconds') < strftime('%H:%M:%S',tb.t)
)
Online demo is here where I've gone through a couple of iterations to simplify it to the above. Basically you need to look at both the previous row and the next row to determine whether you can delete the current row, which happens only when there's a difference of less than 20 seconds between the previous and next row times, as I understand your requirement.
Note: You could probably achieve the same using LAG and LEAD but I'll leave that as an exercise to anyone else who's interested!!
EDIT: In case the time values are not unique, I've included additional conditions to the ta/tb and ta/tc joins to include col1 and col2 and updated the fiddle.
I think you can do the following:
Create a result set in SQL that adds the previous row ordered by id (for this use LAG function (https://www.sqlitetutorial.net/sqlite-window-functions/sqlite-lag/)
Calculate a new column using the CASE construct (https://www.sqlitetutorial.net/sqlite-case/). This column could be a boolean called "keep" that basically is calculated in the following way:
if the previous row col1 and col2 values are not the same => true
if the previous row col1 and col2 values are the same but the time difference > 20 sec => true
in other cases => false
Filter on this query to only select the rows to keep (keep = true).
I'm using SQL Server 14 and I need to count the number of null values in a row to create a new column where a "% of completeness" for each row will be stored. For example, if 9 out of 10 columns contain values for a given row, the % for that row would be 90%.
I know this can be done via a number of Case expressions, but the thing is, this data will be used for a live dashboard and won't be under my supervision after completion.
I would like for this % to be calculated every time a function (or procedure? not sure what is used in this case) is run and need to know the number of columns that exist in my table in order to count the null values in a row and then divide by the number of columns to find the "% of completeness".
Any help is greatly appreciated!
Thank you
One method uses cross apply to unpivot the columns to rows and count the ratio of non-null values.
Assuming that your table has columns col1 to col4, you would write this as:
select t.*, x.*
from mytable t
cross apply (
select avg(case when col is not null then 1.0 else 0 end) completeness_ratio
from (values (col1), (col2), (col3), (col4)) x(col)
) x
TableA
Col1
----------
1
2
3
4....all the way to 27
I want to add a second column that assigns a number to groups of 5.
Results
Col1 Col2
----- ------
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2...and so on
The 6th group should have 2 rows in it.
NTILE doesn't accomplish what I want because of the way NTILE handles the groups if they aren't divisible by the integer.
If the number of rows in a partition is not divisible by integer_expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, the rows will be evenly distributed among the groups. For example, if the total number of rows is 50, and there are five groups, each bucket will contain 10 rows.
This is clearly demonstrated in this SQL Fiddle. Groups 4, 5, 6 each have 4 rows while the rest have 5. I have some started some solutions but they were getting lengthy and I feel like I'm missing something and that this could be done in a single line.
You can use this:
;WITH CTE AS
(
SELECT col1,
RN = ROW_NUMBER() OVER(ORDER BY col1)
FROM TableA
)
SELECT col1, (RN-1)/5+1 col2
FROM CTE;
In your sample data, col1 is a correlative without gaps, so you could use it directly (if it's an INT) without using ROW_NUMBER(). But in the case that it isn't, then this answer works too. Here is the modified sqlfiddle.
A bit of math can go a long way. subtracting 1 from all values puts the 5s (edge cases) into the previous group here, and 6's into the next. flooring the division by your group size and adding one give the result you're looking for. Also, the SQLFiddle example here fixes your iterative insert - the table only went up to 27.
SELECT col1,
floor((col1-1)/5)+1 as grpNum
FROM tableA
I have a set of rows with one column of actual data. The goal is display this data in Matrix format. The numbers of Column will remain same, the number of rows may vary.
For example:
I have 20 records. If I have 5 columns - then the number of rows would be 4
I have 24 records. I have 5 columns the number of rows would be 5, with the 5th col in 5th row would be empty.
I have 18 records. I have 5 columns the number of rows would be 4, with the 4th & 5th col in 4th row would be empty.
I was thinking of generating a column value against each row. This column value would b,e repeated after 5 rows. But I cannot the issue is "A SELECT statement that assigns a value to a variable must not be combined with data-retrieval operations"
Not sure how it can be achieved.
Any advice will be helpful.
Further Addition - I have managed to generate the name value association with column name and value. Example -
Name1 Col01
Name2 Col02
Name3 Col03
Name4 Col01
Name5 Col02
You can use ROW_NUMBER to assign a sequential integer from 0 up. Then group by the result of integer division whilst pivoting on the remainder.
WITH T AS
(
SELECT number,
ROW_NUMBER() OVER (ORDER BY number) -1 AS RN
FROM master..spt_values
)
SELECT MAX(CASE WHEN RN%5 = 0 THEN number END) AS Col1,
MAX(CASE WHEN RN%5 = 1 THEN number END) AS Col2,
MAX(CASE WHEN RN%5 = 2 THEN number END) AS Col3,
MAX(CASE WHEN RN%5 = 3 THEN number END) AS Col4,
MAX(CASE WHEN RN%5 = 4 THEN number END) AS Col5
FROM T
GROUP BY RN/5
ORDER BY RN/5
In general:
SQL is for retrieving data, that is all your X records in one column
Making a nice display of your data is usually the job of the software that queries SQL, e.g. your web/desktop application.
However if you really want to build the display output in SQL you could use a WHILE loop in connection with LIMIT and PIVOT. You would just select the first 5 records, than the next ones until finished.
Here is an example of how to use WHILE: http://msdn.microsoft.com/de-de/library/ms178642.aspx