Sum case when like then - sql

I am using a dataset on BigQuery and essentially I would like to pull a table to show the total volume of liters sold per month of a specific year. This is currently what I have written:
SELECT
SUM( CASE WHEN `date` LIKE '2012-01-%' THEN `volume_sold_liters` END) as Jan_Total
FROM `personal-projects-340200.Iowa_liquor_cedar_rapids.store_3`
This is the error message I am also getting:
No matching signature for operator LIKE for argument types: DATE, STRING. Supported signatures: STRING LIKE STRING; BYTES LIKE BYTES at [2:16]
I understand that the error message is asking for me to change the string to a date but how do I do that? I have multiple dates in the same month that I want added together. I tried to switching the date to a string instead and I get NULL in my table.
What am I doing wrong? Is there a better way to go about pulling the table I want?

You can use the dedicated date functions offered by BigQuery in your case. Since you want the month part of the purchased date you can consider something like the query below
SELECT
SUM(CASE
WHEN DATE_TRUNC(purchase_date, MONTH) = '2022-01-01' THEN volume
ELSE 0
END
) AS Jan_Total
FROM
`personal-projects-340200.Iowa_liquor_cedar_rapids.store_3`
The error you get is because you try to compare a DATE to a STRING. To fix this you could go for
SELECT
SUM(CASE
WHEN CAST(purchase_date AS STRING) LIKE '2022-01-%' THEN volume
ELSE 0
END
) AS Jan_Total
FROM
`personal-projects-340200.Iowa_liquor_cedar_rapids.store_3`
But it's better to use date functions when manipulating dates.

Related

Hive - calculating string type timestamp differences in minutes

I'm novice to SQL (in hive) and trying to calculate every anonymousid's time spent between first event and last event in minutes. The resource table's timestamp is formatted as string,
like: "2020-12-24T09:47:17.775Z". I've tried in two ways:
1- Cast column timestamp to bigint and calculated the difference from main table.
select anonymousid, max(from_unixtime(cast('timestamp' as bigint)) - min(from_unixtime(cast('timestamp' as bigint)) from db1.formevent group by anonymousid
I got NULLs after implementing this as a solution.
2- Create a new table from main resource, put conditions to call with 'where' and tried to convert 'timestamp' to date format without any min-max calculation.
create table db1.successtime as select anonymousid, pagepath,buttontype, itemname, 'location', cast(to_date(from_unixtime(unix_timestamp('timestamp', "yyyy-MM-dd'T'HH:mm:ss.SSS"),'HH:mm:ss') as date) from db1.formevent where pagepath = "/account/sign-up/" and itemname = "Success" and 'location' = "Standard"
Then I got NULLs again and I left. It looks like this
Is there any way I can reformat and calculate time difference in minutes between first and last event ('timestamp') and take the average grouped by 'location'?
select anonymousid,
(max(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")) -
min(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
) / 60
from db1.formevent
group by anonymousid;
From your description, this should work:
select anonymousid,
(max(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss') -
min(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss')
) / 60
from db1.formevent
group by anonymousid;
Note that the column name is not in single quotes.

Is there a way to divide two halves of a boolean value in sql?

I'm currently working on a task where I'm dealing with a table in which every row has a boolean value, "is_important". I am trying to create a ratio of important entries to total entries grouped by date but I can't seem to get SQL to recognize that I want to divide using a WHERE clause.
One method is:
select date, avg( case when is_important then 1.0 else 0 end) as important_ratio
from t
group by date;
There may also be shortcuts, depending on the database you are using, such as:
avg( is_important )
avg( is_important::int )

Group by: calculated field to return respective date in bigquery

I need to do an user level analysis. As the data has a lot of different rows per user (related to different events), I need to group by user and create some calculated fields that represent the different rows. One of the fields is a calculation of the number of days since the last purchase of the user (today - last purchase date). I already tried a lot of different codes and also did a lot of research, but could not find the solution.
The codes that for me makes more sense but did not work are below:
Using case when statement
SELECT CASE WHEN LAST(tr_orderid <> "") THEN
DATEDIFF(CURRENT_DATE(),event_date) ELSE NULL END AS recency_lastbooking
FROM df
GROUP BY domain_userid
Using IF statement
SELECT IF(LAST(tr_total > 0), DATEDIFF(CURRENT_DATE(),event_date), NULL)
AS recency_lastbooking
FROM df
GROUP BY domain_userid
The error that I get is: Expression 'event_date' is not present in the GROUP BY list
I think if I use LAST(event_date) the query will return the last date in all the lines of the specific user, instead of return the last day the user had a purchase event.
P.S: I can use tr_total (total transaction) > 0 or tr_orderid (transaction order id) <> ""
Thank you!
I think you just want a window function:
SELECT DATE_DIFF(CURRENT_DATE,
MAX(tr_orderid) OVER (PARTITION BY domain_userid),
day
) AS recency_lastbooking
FROM df;

Report Next Date in Sequence

Our database keeps track of only 5 billing payments received each month. How can I write a select statement that will look, sequentially, at each payment received date and if there is a date entered to move on the the next payment, etc...then eventually come across the date with an empty field and report that date?
I've tried the following case statement but think I am on the wrong track;
select db.identifier,
case when recdate1 is not null then recdate1
when recdate2 is not null then recdate2
when recdate3 is not null then recdate3 end
from db
You should use the COALESCE function for this. The COALESCE function will take n number of input fields, and return the first NOT NULL input field from left to right. In other words, if you sequence the recdate fields from recdate1 to recdate 5, it will return the first one that is NOT NULL.
Here is the code to achieve this.
SELECT db.identifier
, COALESCE(recdate1, recdate2, recdate3, recdate4, recdate5) AS recdate
FROM mytable
Here is a link to the COALESCE function for more information. Hope this helps.
https://msdn.microsoft.com/en-us/library/ms190349.aspx

Get data between record in table

I have data like this:
For example, today is on April 2012. Referring to data above, I want to get the data with M_PER = 03-2012 because this month is in the range 03-2012 TO 06-2012.
--EditedIn this case, I wanna get a rate for used currency code. Because today is still in April, and I want to know rate US Dollar (USD) to Indonesia Rupiah (IDR) I must get the data with M_PER = 03-2012 and CRR_CURRENCY_CODE = USD.
The question is what query can retrieve data like that?
Since you seem to be using quarterly values, I would use the TRUNC function with the 'Q' format model. This truncates a date to 1/1/YYYY, 1/4/YYYY, 1/7/YYYY and 1/10/YYYY, i.e. the first day of the quarter.
To fit your model which is the month at the end of the quarter, you would then have to add two months. This assumes that the MONTH_PERIOD column is a SQL date and not some other data type.
Included below is an example, using SYSDATE as the input date.
select *
from your_table
where add_months(trunc(sysdate, 'Q'),2) = month_period;
I use the rownum and order by to get the value.
SELECT * FROM tables WHERE m_per > '04-2012' AND ROWNUM = 1 ORDER BY month_period ASC