SQL BETWEEN AND operation on date is not working - sql

This is the SQL query that I'm trying to execute:
select *,count(dummy) over(partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
WHERE (max_timestamp BETWEEN '2017-01-01' AND '2018-01-01')
ORDER BY max_timestamp DESC
As far as I know in a BETWEEN AND operation, both values are inclusive. Here, this query is unable to fetch records corresponding to 2018-01-01.
I changed the query to this:
select *,count(dummy) over(partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
WHERE (max_timestamp >= '2017-01-01' AND max_timestamp <= '2018-01-01')
ORDER BY max_timestamp DESC
Still, it's not working.
Then I tried this:
select *,count(dummy) over(partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
WHERE (max_timestamp >= '2017-01-01' AND max_timestamp <= '2018-01-02')
ORDER BY max_timestamp DESC
It's able to fetch records related to 2018-01-01.
What could be the reason for this? and how can I fix this?
Thanks in advance.

This is your query:
select *, count(dummy) over (partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
where max_timestamp BETWEEN '2017-01-01' AND '2018-01-01'
order by max_timestamp DESC;
Simply don't use between with date times. Use explicit logic:
select *, count(dummy) over (partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
where max_timestamp >= '2017-01-01' and
max_timestamp < '2018-01-02' --> notice this is one day later
order by max_timestamp DESC;
The problem is that you have a time component on the date.
Aaron Bertrand explains this very well in his blog What do BETWEEN and the devil have in common? (I am amused by the title, given that BETWEEN definitely does exist, but there is more controversy about the existence of the devil.)

This is a known issue with Spark.
Please refer to this link for more info: https://issues.apache.org/jira/browse/SPARK-10837
I've fixed this issue by using the date_add function provided by spark.
so the last date was changed to date_add(endDate, 1) so that we'll get all the values including those corresponding to the last date.

Related

Order By Month Name - Postgresql

I followed this post Order by using month name in PostgreSQL but not success!
I have a query (its working) and I just need to order the results by mont name. This is thr query I am using:
select to_char(purchase_date, 'Month') as mes_2021,
sum(gmv::float4) as soma_gmv
from tablename
where purchase_date > '2021-01-01'
GROUP BY mes_2021
I am trying:
order by to_date(purchase_date, 'Month') - No success
order by date_part(purchase_date::date, 'Month') - No success
If i use order by mes_2021
One trick is to use a window function on the date:
select to_char(purchase_date, 'Month') as mes_2021,
sum(gmv::float4) as soma_gmv
from tablename
where purchase_date > '2021-01-01'
group by mes_2021
order by min(purchase_date);
This, of course, assumes that the dates are all in the same year. But your where clause is taking care of that.

Aggregrate the variable from timestamp on bigQuery

I am planning to calculate the most frequency part_of_day for each of the user. In this case, firstly, I encoded timestamp with part_of_day, then aggregrate with the most frequency part_of_day. I use the ARRAY_AGG to calculate the mode (). However, I’m not sure how to deal with timestamp with the ARRAY_AGG, because there is error, so my code structure might be wrong
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS Time
COUNT(*) AS cnt
Error received:
Syntax error: Expected ")" but got identifier "COUNT" at [19:9]
Even though you did not share any sample data, I was able to identify some issues within your code.
I have used some sample data I created based in the formats and functions you used in your code to keep consistency. Below is the code, without any errors:
WITH data AS (
SELECT 98 as User_ID,DATETIME "2008-12-25 05:30:00.000000" AS Request_Timestamp, "something!" AS channel UNION ALL
SELECT 99 as User_ID,DATETIME "2008-12-25 22:30:00.000000" AS Request_Timestamp, "something!" AS channel
)
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day1,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS time,
COUNT(*) AS cnt
FROM data
GROUP BY User_ID, Channel, Request_Timestamp
#order by Request_Timestamp
)
GROUP BY User_ID, Time;
First, notice that I have changed the column's name in your ARRAY_AGG() method, it had to be done because it would cause the error "Duplicate column name". Second, after your TIME_TRUNC() function, it was missing a comma so you could select COUNT(*). Then, within your GROUP BY, you needed to group Request_Timestamp as well because it wasn't aggregated nor grouped. Lastly, in your last GROUP BY, you needed to aggregate or group time. Thus, after theses corrections, your code will execute without any errors.
Note: the Syntax error: Expected ")" but got identifier "COUNT" at [19:9] error you experienced is due to the missing comma. The others would be shown after correcting this one.
If you want the most frequent part of each day, you need to use the day part in the aggregation:
SELECT User_ID,
ARRAY_AGG(part_of_day ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day
FROM (SELECT User_ID,
(case when time BETWEEN '04:00:00' AND '12:00:00' then 'morning'
when time < '04:00:00' OR time > '20:00:00' then 'night'
end) AS part_of_day
COUNT(*) AS cnt
FROM cognitivebot2.chitchaxETL.conversations
GROUP BY User_ID, part_of_day
) u
GROUP BY User_ID;
Obviously, if you want the channel as well, then you need to include that in the queries.

Find rows with similar date values

I want to find customers where for example, system by error registered duplicates of an order.
It's pretty easy, if reg_date is EXACTLY the same but I have no idea how to implement it in query to count as duplicate if for example there was up to 1 second difference between transactions.
select * from
(select customer_id, reg_date, count(*) as cnt
from orders
group by 1,2
) x where cnt > 1
Here is example dataset:
https://www.db-fiddle.com/f/m6PhgReSQbVWVZhqe8n4mi/0
CUrrently only customer's 104 orders are counted as duplicates because its reg_date is identical, I want to count also orders 1,2 and 4,5 as there's just 1 second difference
demo:db<>fiddle
SELECT
customer_id,
reg_date
FROM (
SELECT
*,
reg_date - lag(reg_date) OVER (PARTITION BY customer_id ORDER BY reg_date) <= interval '1 second' as is_duplicate
FROM
orders
) s
WHERE is_duplicate
Use the lag() window function. It allows to have a look hat the previous record. With this value you can do a diff and filter the records where the diff time is more than one second.
Try this following script. This will return you day/customer wise duplicates.
SELECT
TO_CHAR(reg_date :: DATE, 'dd/mm/yyyy') reg_date,
customer_id,
count(*) as cnt
FROM orders
GROUP BY
TO_CHAR(reg_date :: DATE, 'dd/mm/yyyy'),
customer_id
HAVING count(*) >1

Check if timestamp is contained in date

I'm trying to check if a datetime is contained in current date, but I'm not veing able to do it.
This is my query:
select
date(timestamp) as event_date,
count(*)
from pixel_logs.full_logs f
where 1=1
where event_date = CUR_DATE()
How can I fix it?
Like Mikhail said, you need to use CURRENT_DATE(). Also, count(*) requires you to GROUP BY the date in your example. I do not know how your data is formatted, but one way to modify your query:
#standardSQL
WITH
table AS (
SELECT
1494977678 AS timestamp_secs) -- Current timestamp (in seconds)
SELECT
event_date,
COUNT(*) as count
FROM (
SELECT
DATE(TIMESTAMP_SECONDS(timestamp_secs)) AS event_date,
CURRENT_DATE()
FROM
table)
WHERE
event_date = CURRENT_DATE()
GROUP BY
event_date;

Manipulating Current_Date

Here is my SQL query:
select id, max(file_uploaded_at) as latest_date, current_date
from employee_imports
where convert(file_uploaded_at) - integer '90'< current_date AND
error_message IS NULL
group by id
order by latest_date DESC
I keep getting back values that are older than 90 days though. Do you know why?
select id, max(file_uploaded_at) as latest_date, current_date
from employee_imports
where current_date - date(file_uploaded_at) <= 90 AND
error_message IS NULL
group by id
order by latest_date DESC
This is what I changed my query to and everything appears to work fine.
Thanks anyway guys.