I have a SQLite table (for messages). The table has two columns for order: created and sent. I need to get result sorted by sent field (descent), but if there is 0, then by created field (also descent).
I'm using SQL-function COALESCE, but the order of the result is wrong.
Normal result (without COALESCE):
SELECT * FROM messages ORDER BY sent DESC
┌─────────────┬──────────┬────────────┬────────────┐
│ external_id │ body │ created │ sent │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ qw │ 1463793500 │ 1463793493 │ <-
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ huyak │ 1463783516 │ 1463662248 │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ tete │ 1463783516 │ 1463662248 │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ Te │ 1463783516 │ 1463662248 │
└─────────────┴──────────┴────────────┴────────────┘
Wrong result (with COALESCE):
SELECT * FROM messages ORDER BY COALESCE(sent,created)=0 DESC
┌─────────────┬──────────┬────────────┬────────────┐
│ external_id │ body │ created │ sent │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ Te │ 1463783516 │ 1463662248 │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ huyak │ 1463783516 │ 1463662248 │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ tete │ 1463783516 │ 1463662248 │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ qw │ 1463793500 │ 1463793493 │ <-
└─────────────┴──────────┴────────────┴────────────┘
I tried to remove expression =0, then the order is correct, but that request doesn't work correctly if sent = 0:
SELECT * FROM messages ORDER BY COALESCE(sent,created) DESC
┌─────────────┬──────────┬────────────┬────────────┐
│ external_id │ body │ created │ sent │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ qw │ 1463793500 │ 1463793493 │ <-
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ huyak │ 1463783516 │ 1463662248 │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ tete │ 1463783516 │ 1463662248 │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ Te │ 1463783516 │ 1463662248 │
└─────────────┴──────────┴────────────┴────────────┘
but
┌─────────────┬──────────┬────────────┬────────────┐
│ external_id │ body │ created │ sent │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ Te │ 1463783516 │ 1463662248 │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ huyak │ 1463783516 │ 1463662248 │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ tete │ 1463783516 │ 1463662248 │
├─────────────┼──────────┼────────────┼────────────┤
│ ... │ qw │ 1463793500 │ 0 │ <-
└─────────────┴──────────┴────────────┴────────────┘
Does anyone know why it's happening and how to fix it?
COALESCE handles NULLs, it won't help you here. It will always return sent to you. If you compare its result to zero you're only sorting based on whether sent is zero or not. You'll have to use a CASE
... ORDER BY CASE sent WHEN 0 THEN created ELSE sent END DESC;
If you had NULLs where there is no timestamp then you could use COALESCE without the comparison.
COALESCE handles NULLs, not zeros.
You can convert zero values to NULL with the nullif() function:
SELECT * FROM messages ORDER BY COALESCE(NULLIF(sent,0),created) DESC;
Related
I'm trying to get percentage from the table in clickhouse DB. I'm having a difficulty writing a query that will calculate percentage of type within each timestamp group.
SELECT
(intDiv(toUInt32(toDateTime(atime)), 120) * 120) * 1000 AS timestamp,
if(dateDiff('second', toDateTime(t1.atime), toDateTime(t2.unixdsn)) <= 5, 'sec5', if((dateDiff('second', toDateTime(t1.atime), toDateTime(t2.unixdsn)) > 5) AND (dateDiff('second', toDateTime(t1.atime), toDateTime(t2.unixdsn)) <= 30), 'sec30', if((dateDiff('second', toDateTime(t1.atime), toDateTime(t2.unixdsn)) > 30) AND (dateDiff('second', toDateTime(t1.atime), toDateTime(t2.unixdsn)) <= 60), 'sec60', 'secgt60'))) AS type,
count() AS total_count,
(total_count * 100) /
(
SELECT count()
FROM sess_logs.logs_view
WHERE (status IN (0, 1)) AND (toDateTime(atime) >= toDateTime(1621410625)) AND (toDateTime(atime) <= toDateTime(1621421425))
) AS percentage_cnt
FROM sess_logs.logs_view AS t1
INNER JOIN
(
SELECT
trid,
atime,
unixdsn,
status
FROM sess_logs.logs_view
WHERE (status = 1) AND (toDate(date) >= toDate(1621410625)) AND if('all' = 'all', 1, userid =
(
SELECT userid
FROM sess_logs.user_details
WHERE (username != 'all') AND (username = 'all')
))
) AS t2 ON t1.trid = t2.trid
WHERE (t1.status = 0) AND (t2.status = 1) AND ((toDate(atime) >= toDate(1621410625)) AND (toDate(atime) <= toDate(1621421425))) AND (toDateTime(atime) >= toDateTime(1621410625)) AND (toDateTime(atime) <= toDateTime(1621421425)) AND if('all' = 'all', 1, userid =
(
SELECT userid
FROM sess_logs.user_details
WHERE (username != 'all') AND (username = 'all')
))
GROUP BY
timestamp,
type
ORDER BY timestamp ASC
Output
┌─────timestamp─┬─type────┬─total_count─┬─────────percentage_cnt─┐
│ 1621410600000 │ sec5 │ 15190 │ 0.9650982602181922 │
│ 1621410600000 │ sec30 │ 1525 │ 0.09689103665785011 │
│ 1621410600000 │ sec60 │ 33 │ 0.002096658498169871 │
│ 1621410600000 │ secgt60 │ 61 │ 0.0038756414663140043 │
│ 1621410720000 │ secgt60 │ 67 │ 0.004256852102344891 │
│ 1621410720000 │ sec30 │ 2082 │ 0.13228009070271735 │
│ 1621410720000 │ sec60 │ 65 │ 0.004129781890334595 │
│ 1621410720000 │ sec5 │ 20101 │ 1.2771191658094723 │
│ 1621410840000 │ sec30 │ 4598 │ 0.29213441741166873 │
│ 1621410840000 │ sec60 │ 36 │ 0.002287263816185314 │
│ 1621410840000 │ secgt60 │ 61 │ 0.0038756414663140043 │
│ 1621410840000 │ sec5 │ 17709 │ 1.1251431922451591 │
│ 1621410960000 │ sec60 │ 17 │ 0.0010800968020875095 │
│ 1621410960000 │ secgt60 │ 81 │ 0.005146343586416957 │
│ 1621410960000 │ sec30 │ 2057 │ 0.13069171305258864 │
│ 1621410960000 │ sec5 │ 18989 │ 1.206468127931748 │
│ 1621411080000 │ sec60 │ 9 │ 0.0005718159540463285 │
│ 1621411080000 │ sec30 │ 3292 │ 0.20915756896894594 │
│ 1621411080000 │ sec5 │ 15276 │ 0.9705622793346349 │
│ 1621411080000 │ secgt60 │ 78 │ 0.004955738268401514 │
└───────────────┴─────────┴─────────────┴────────────────────────┘
It is returning the percentage for each row, but when I do sum of percentage_cnt column, the total does not goes to 100% instead it goes to 80%.
Please help me in correcting my query. I know query is huge, you guys can give simpler example for my use case. Thanks.
I am looking for the opposite of the dropmissing function in DataFrames.jl so that the user knows where to look to fix their bad data. It seems like this should be easy, but the filter function expects a column to be specified and I cannot get it to iterate over all columns.
julia> df=DataFrame(a=[1, missing, 3], b=[4, 5, missing])
3×2 DataFrame
│ Row │ a │ b │
│ │ Int64? │ Int64? │
├─────┼─────────┼─────────┤
│ 1 │ 1 │ 4 │
│ 2 │ missing │ 5 │
│ 3 │ 3 │ missing │
julia> filter(x -> ismissing(eachcol(x)), df)
ERROR: MethodError: no method matching eachcol(::DataFrameRow{DataFrame,DataFrames.Index})
julia> filter(x -> ismissing.(x), df)
ERROR: ArgumentError: broadcasting over `DataFrameRow`s is reserved
I am basically trying to recreate the disallowmissing function, but with a more useful error message.
Here are two ways to do it:
julia> df = DataFrame(a=[1, missing, 3], b=[4, 5, missing])
3×2 DataFrame
│ Row │ a │ b │
│ │ Int64? │ Int64? │
├─────┼─────────┼─────────┤
│ 1 │ 1 │ 4 │
│ 2 │ missing │ 5 │
│ 3 │ 3 │ missing │
julia> df[.!completecases(df), :] # this will be faster
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64? │ Int64? │
├─────┼─────────┼─────────┤
│ 1 │ missing │ 5 │
│ 2 │ 3 │ missing │
julia> #view df[.!completecases(df), :]
2×2 SubDataFrame
│ Row │ a │ b │
│ │ Int64? │ Int64? │
├─────┼─────────┼─────────┤
│ 1 │ missing │ 5 │
│ 2 │ 3 │ missing │
julia> filter(row -> any(ismissing, row), df)
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64? │ Int64? │
├─────┼─────────┼─────────┤
│ 1 │ missing │ 5 │
│ 2 │ 3 │ missing │
julia> filter(row -> any(ismissing, row), df, view=true) # requires DataFrames.jl 0.22
2×2 SubDataFrame
Row │ a b
│ Int64? Int64?
─────┼──────────────────
1 │ missing 5
2 │ 3 missing
I have the following Dataframe
using DataFrames, Statistics
df = DataFrame(name=["John", "Sally", "Kirk"],
age=[23., 42., 59.],
children=[3,5,2], height = [180, 150, 170])
print(df)
3×4 DataFrame
│ Row │ name │ age │ children │ height │
│ │ String │ Float64 │ Int64 │ Int64 │
├─────┼────────┼─────────┼──────────┼────────┤
│ 1 │ John │ 23.0 │ 3 │ 180 │
│ 2 │ Sally │ 42.0 │ 5 │ 150 │
│ 3 │ Kirk │ 59.0 │ 2 │ 170 │
I can compute the mean of a column as follow:
println(mean(df[:4]))
166.66666666666666
Now I want to get the mean of all the numeric column and tried this code:
x = [2,3,4]
for i in x
print(mean(df[:x[i]]))
end
But got the following error message:
MethodError: no method matching getindex(::Symbol, ::Int64)
Stacktrace:
[1] top-level scope at ./In[64]:3
How can I solve the problem?
You are trying to access the DataFrame's column using an integer index specifying the column's position. You should just use the integer value without any : before i, which would create the symbol :i but you do not a have column named i.
x = [2,3,4]
for i in x
println(mean(df[i])) # no need for `x[i]`
end
You can also index a DataFrame using a Symbol denoting the column's name.
x = [:age, :children, :height];
for c in x
println(mean(df[c]))
end
You get the following error in your attempt because you are trying to access the ith index of the symbol :x, which is an undefined operation.
MethodError: no method matching getindex(::Symbol, ::Int64)
Note that :4 is just 4.
julia> :4
4
julia> typeof(:4)
Int64
Here is a one-liner that actually selects all Number columns:
julia> mean.(eachcol(df[findall(x-> x<:Number, eltypes(df))]))
3-element Array{Float64,1}:
41.333333333333336
3.3333333333333335
166.66666666666666
For many scenarios describe is actually more convenient:
julia> describe(df)
4×8 DataFrame
│ Row │ variable │ mean │ min │ median │ max │ nunique │ nmissing │ eltype │
│ │ Symbol │ Union… │ Any │ Union… │ Any │ Union… │ Nothing │ DataType │
├─────┼──────────┼─────────┼──────┼────────┼───────┼─────────┼──────────┼──────────┤
│ 1 │ name │ │ John │ │ Sally │ 3 │ │ String │
│ 2 │ age │ 41.3333 │ 23.0 │ 42.0 │ 59.0 │ │ │ Float64 │
│ 3 │ children │ 3.33333 │ 2 │ 3.0 │ 5 │ │ │ Int64 │
│ 4 │ height │ 166.667 │ 150 │ 170.0 │ 180 │ │ │ Int64 │
In the question println(mean(df[4])) works as well (instead of println(mean(df[:4]))).
Hence we can write
x = [2,3,4]
for i in x
println(mean(df[i]))
end
which works
I have a DataFrame variable like this:
julia> data
11×7 DataFrames.DataFrame
│ Row │ Time │ Wind1VelX │ Wind1VelY │ Wind1VelZ │ TwrBsFxt │ TwrBsFyt │ TwrBsFzt │
│ 1 │ 0.0 │ 25.17 │ 0.944 │ -0.1424 │ 325.4 │ -123.2 │ -6726.0 │
│ 2 │ 0.01 │ 25.62 │ 0.592 │ -0.335 │ 338.7 │ -131.0 │ -6749.0 │
│ 3 │ 0.02 │ 26.07 │ 0.24 │ -0.5275 │ 345.7 │ -141.7 │ -6754.0 │
I would like to know if there is a method to get the column names in an array of String like:
julia> header=["Time", "Wind1VelX", "Wind1VelY", "Wind1VelZ", "TwrBsFxt", "TwrBsFyt", "TwrBsFzt"]
Thanks in advance
You can write:
String.(names(data))
Without String., like this:
names(data)
you will get a vector of Symbol.
Note that calling String, in this case, converts a single Symbol to String and by adding a dot . after it you broadcast it over all elements of a vector returned by names(data).
Get a Vector of Strings:
names(data)
Get a Vector of Symbols:
propertynames(data)
I guess should be some visualization tools (MS SQL Server) to represent some hierarchical style of the SQL query data as an output result.
I just have some hierarchical table chain with 7 tables and I have to query 1-2nd level of it very often in order to check the bottom of this chain as well as some intermediate tables.
Any clue guys?
Thank you in advance!
P.S. It would be cool if MS SQL Management Studio could accept some plugins in its next generation... :)
Brad Schulz has a pretty amazing proc (usp_DrawTree) here:
http://bradsruminations.blogspot.com/2010/04/t-sql-tuesday-005-reporting.html
Here is one of his example outputs:
/*
┌───────────┐
│ Anne │
┌─┤ Dodsworth │ Sales Representative
│ │ Ext452 │
│ └───────────┘
┌──────────┐ │
│ Steven │ │
┌─┤ Buchanan ├─┤ Sales Manager
│ │ Ext3453 │ │
│ └──────────┘ │
│ │ ┌────────┐
│ │ │ Robert │
│ ├─┤ King │ Sales Representative
│ │ │ Ext465 │
│ │ └────────┘
│ │ ┌─────────┐
│ │ │ Michael │
│ └─┤ Suyama │ Sales Representative
│ │ Ext428 │
│ └─────────┘
│ ┌──────────┐
│ │ Laura │
├─┤ Callahan │ Inside Sales Coordinator
│ │ Ext2344 │
│ └──────────┘
┌─────────┐ │
│ Andrew │ │
│ Fuller ├─┤ Vice President, Sales
│ Ext3457 │ │
└─────────┘ │
│ ┌─────────┐
│ │ Nancy │
├─┤ Davolio │ Sales Representative
│ │ Ext5467 │
│ └─────────┘
│ ┌───────────┐
│ │ Janet │
├─┤ Leverling │ Sales Representative
│ │ Ext3355 │
│ └───────────┘
│ ┌──────────┐
│ │ Margaret │
└─┤ Peacock │ Sales Representative
│ Ext5176 │
└──────────┘
*/
For Oracle anyway (i got here via SQL tag), you can use lpad with the associated level to format the output (similar to an expanded folder view, deeper levels have more indentation):
SELECT LEVEL,
LPAD(' ', 2 * LEVEL - 1) || first_name || ' ' ||
last_name AS employee
FROM employee
START WITH employee_id = 1
CONNECT BY PRIOR employee_id = manager_id;