Missing Expression - sql

I have 2 tables one contain just the dates such as
table1: select display_date from dates; --will display the whole month dates(01-31)
____________________________
display_date
___________________
01-OCT-14
02-OCT-14
03-OCT-14
.....SO ON
table2: select display_date, weekday, day, month from employee_Day -- this contains some dates from month (01,04,05, etc..). it wont contain all the dates
______________________________________________
|display_date | weekday | day | month |
-----------------------------------------------
01-OCT-14 7 01 10
04-OCT-14 5 04 10
_______________________________________________
I need to join those two tables and i have to get the output of all the dates and null values for the records which i need the output like as shown below
_____________________________________________
display_date | weekday | day | month |
_______________________________________________
01-OCT-14 7 01 10
02-OCT-14 5 02 10
03-OCT-14 4 03 10
select a.display_date, b.weekday, b.day, b.month from (subquery1) a, (subquery2) b where TO_CHAR(TO_DATE(a.DISPLAY_DATE,'DD-MON-RR'),'DD')= TO_CHAR(b.DAY_NUMBER)(+);
subquery1: select first table values
subquery2 : get secong table values
I am getting missing expression error .
I need to get common values in DISPLAY_DATE column if there is no value for display_date in table 2 then it has to join the result from table1.
I cant use union because the columns on table 1 and 2 are different
Any idea ?

you need to use left join
SELECT d.display_date, e.weekday, e.day, e.month
FROM Dates d
LEFT JOIN employee_Day e
ON d.display_date = e.display_date

I suppose you need something like this:
select a.display_date,
nvl(b.weekday, to_char(a.display_date, 'D')) weekday,
nvl(b.day, to_char(a.display_date, 'DD')) day,
nvl(b.month, to_char(a.display_date, 'MM')) month
from table1 a left join table2 b on a.display_date = b.display_date
order by a.display_date;
Oracle recommends to avoid using of (+) as it's derpicated.
NVL(expr1, expr2) = if expr1 is null then expr2 else expr1

Related

SQL Query: CREATE a table with rows divided by month/year and COUNT the number of values WHERE '01/month/year' IS BETWEEN two date-columns

this is my first question here.
I have a problem in creating a complex query to group values based on if the first day of month/year falls in between two date columns.
here is an example of the table I have:
USER_ID
START_DATE
END_DATE
A
03/07/2020
31/07/2020
A
05/06/2020
03/07/2020
A
08/05/2020
05/06/2020
A
10/04/2020
08/05/2020
B
13/02/2020
12/03/2020
B
16/01/2020
13/02/2020
C
22/05/2020
19/06/2020
C
24/04/2020
22/05/2020
D
25/09/2020
23/10/2020
D
28/08/2020
25/09/2020
D
31/07/2020
28/08/2020
D
03/07/2020
31/07/2020
D
05/06/2020
03/07/2020
E
25/11/2020
23/12/2020
E
28/10/2020
25/11/2020
E
30/09/2020
28/10/2020
F
14/2/2020
13/3/2020
F
17/1/2020
14/2/2020
F
20/12/2019
17/1/2020
F
22/11/2019
20/12/2019
G
7/11/2020
5/12/2020
G
10/10/2020
7/11/2020
and I wish to have something like that:
YEAR
MONTH
COUNT(DISTINCT USER_ID)
2019
11
0
2019
12
1
2020
1
1
2020
2
2
2020
3
2
2020
4
0
2020
5
2
2020
6
2
2020
7
2
2020
8
1
2020
9
1
2020
10
2
2020
11
2
2020
12
2
For instance, in Feb 2020 user "B" and user "F" had a range of dates that included the date 01/Feb/2020 (the condition is true for:
USER_ID
START_DATE
END_DATE
B
16/01/2020
13/02/2020
and for:
USER_ID
START_DATE
END_DATE
F
17/1/2020
14/2/2020
...so the count will be 2.
Do you know any way to do it in SQL (or Ruby)?
Thanks a lot!
Try this :
WITH m AS
( SELECT generate_series(min(date_trunc('month', start_date)), max(end_date), '1 month') :: date AS month
FROM my_table AS t
)
SELECT to_char(m.month, 'YYYY') AS year
, to_char(m.month, 'MM') AS month
, count(DISTINCT t.user_id) AS "count(distinct user_id)"
FROM my_table AS t
RIGHT JOIN m
ON daterange(t.start_date, t.end_date) #> m.month
GROUP BY m.month
ORDER BY m.month
The first query "m" calculates the list of months that cover the start_date and end_date of my_table.
The second query join my_table with the resulting table "m" in order to select all the users whose interval daterange(start_date, end_date) contains the 1st day of the month (see the manual).
Then the rows are grouped by m.month and the number of distinct user_id per month is calculated with the count(DISTINCT user_id) aggregate function (see the manual).
Finally the RIGHT JOIN clause allows to select the months with no corresponding user_id in my_table (see the manual).
See the test result in dbfiddle.

How to avoid transition between column-organized data processing and row-organized data processing

I'm working on DB2 Blu on with column organized tables.
My dataset is the following :
Day month year value
------- -------
20200101 202001 2020 100
20200102 202001 2020 110
...
20200215 202002 2020 120
I want to aggregate by week, month and year for this result :
Id value
2020 12000
202001 4000 'january
202002 4000 'february
2020001 700 'first week of 2020
In order to do this, I also have the table d_tps
Type Id week month year
J 20200101 2020001 202001 2020
J 20200102 2020001 202001 2020
...
J 20200215 2020007 202002 2020
M 202001 null 202001 2020
M 202002 null 202002 2020
Y 2020 null null 2020
My approach is the following
select d.id, sum(value) from tab1
Inner join d_tps d
On d.id = tab1.year
Or d.id = tab1.month
Or d.id = tab1.year
group by d.id
It works and return the expected result. Unfortunately, in the query plan, the join with OR condition causes the CTQ operator to come early and most of the query (which is in reality more complex) is treated as rows instead of columns.
How can I optimize it ?
It looks like one join condition is sufficient along with aggregation:
select d.week, sum(value)
from tab1 Inner join
d_tps d
On d.id = tab1.day
group by d.week
If you want to aggregate by multiple time levels, then use grouping sets:
select d.week, d.month, d.year, sum(value)
from tab1 Inner join
d_tps d
On d.id = tab1.day
group by grouping sets ((d.week), (d.month), (d.year))
You should use GROUP BY GROUPING SETS & GROUPING function to achieve what you want.
WITH T (day, month, year, value) AS
(
values
(20200101, 202001, 2020, 100)
, (20200102, 202001, 2020, 110)
, (20200215, 202002, 2020, 120)
)
SELECT
CASE
WHEN GROUPING(DAY) = 0 THEN DAY
WHEN GROUPING(MONTH) = 0 THEN MONTH
WHEN GROUPING(YEAR ) = 0 THEN YEAR
END AS ID
, SUM(VALUE) AS VALUE
FROM T
GROUP BY GROUPING SETS (DAY, MONTH, YEAR);
The result is:
|ID |VALUE |
|-----------|-----------|
|2020 |330 |
|202001 |210 |
|202002 |120 |
|20200101 |100 |
|20200102 |110 |
|20200215 |120 |

Join tables with dates within intervals of 5 min (get avg)

I want to join two tables based on timestamp, the problem is that both tables didn't had the exact same timestamp so i want to join them using a near timestamp using a 5 minute interval.
This query needs to be done using 2 Common table expressions, each common table expression needs to get the timestamps and group them by AVG so they can match
Freezer | Timestamp | Temperature_1
1 2018-04-25 09:45:00 10
1 2018-04-25 09:50:00 11
1 2018-04-25 09:55:00 11
Freezer | Timestamp | Temperature_2
1 2018-04-25 09:46:00 15
1 2018-04-25 09:52:00 13
1 2018-04-25 09:59:00 12
My desired result would be:
Freezer | Timestamp | Temperature_1 | Temperature_2
1 2018-04-25 09:45:00 10 15
1 2018-04-25 09:50:00 11 13
1 2018-04-25 09:55:00 11 12
The current query that i'm working on is:
WITH Temperatures_1 (
SELECT Freezer, Temperature_1, Timestamp
FROM TABLE_A
),
WITH Temperatures_2 (
SELECT Freezer, Temperature_2, Timestamp
FROM TABLE_B
)
SELECT A.Freezer, A.Timestamp, Temperature_1, Temperature_2
FROM Temperatures_1 as A
RIGHT JOIN Temperatures_2 as B
ON A.FREEZER = B.FREEZER
WHERE A.Timestamp = B.Timestamp
You should may want to modify your join criteria instead of filtering the output. Use BETWEEN to bracket your join value on the timestamps. I chose +/- 150 seconds because that's half of 2-1/2 minutes to either side (5-minute range to match). You may need something different.
;WITH Temperatures_1 (
SELECT Freezer, Temperature_1, Timestamp
FROM TABLE_A
),
WITH Temperatures_2 (
SELECT Freezer, Temperature_2, Timestamp
FROM TABLE_B
)
SELECT A.Freezer, A.Timestamp, Temperature_1, Temperature_2
FROM Temperatures_1 as A
RIGHT JOIN Temperatures_2 as B
ON A.FREEZER = B.FREEZER
AND A.Timestamp BETWEEN (DATEADD(SECOND, -150, B.Timestamp)
AND (DATEADD(SECOND, 150, B.Timestamp)
You should change the key of join two table by adding the timestamp. The timestamp you should need to approximate the datetime on both side tables A and B tables.
First you should check if the value of the left table (A) datetime is under 2.5 minutes then approximate to the near 5 min. If it is greater the approximate to the next 5 minutes. The same thing you should do on the right table (B). Or you can do this on the CTE and the right join remains the same as your query.

How to get the count of distinct values until a time period Impala/SQL?

I have a raw table recording customer ids coming to a store over a particular time period. Using Impala, I would like to calculate the number of distinct customer IDs coming to the store until each day. (e.g., on day 3, 5 distinct customers visited so far)
Here is a simple example of the raw table I have:
Day ID
1 1234
1 5631
1 1234
2 1234
2 4456
2 5631
3 3482
3 3452
3 1234
3 5631
3 1234
Here is what I would like to get:
Day Count(distinct ID) until that day
1 2
2 3
3 5
Is there way to easily do this in a single query?
Not 100% sure if will work on impala
But if you have a table days. Or if you have a way of create a derivated table on the fly on impala.
CREATE TABLE days ("DayC" int);
INSERT INTO days
("DayC")
VALUES (1), (2), (3);
OR
CREATE TABLE days AS
SELECT DISTINCT "Day"
FROM sales
You can use this query
SqlFiddleDemo in Postgresql
SELECT "DayC", COUNT(DISTINCT "ID")
FROM sales
cross JOIN days
WHERE "Day" <= "DayC"
GROUP BY "DayC"
OUTPUT
| DayC | count |
|------|-------|
| 1 | 2 |
| 2 | 3 |
| 3 | 5 |
UPDATE VERSION
SELECT T."DayC", COUNT(DISTINCT "ID")
FROM sales
cross JOIN (SELECT DISTINCT "Day" as "DayC" FROM sales) T
WHERE "Day" <= T."DayC"
GROUP BY T."DayC"
try this one:
select day, count(distinct(id)) from yourtable group by day

How to create a pivot table by product by month in SQL

I have 3 tables:
users (id, account_balance)
grocery (user_id, date, amount_paid)
fishmarket (user_id, date, amount_paid)
Both fishmarket and grocery tables may have multiple occurrences for the same user_id with different dates and amounts paid or have nothing at all for any given user. I am trying to develop a pivot table of the following structure:
id | grocery_amount_paid_January | fishmarket_amount_paid_January
1 10 NULL
2 40 71
The only idea I can come with is to create multiple left joins, but this should be wrong since there will be 24 joins (per each month) for each product. Is there a better way?
I have provided a lot of answers on crosstab queries in PostgreSQL lately. Sometimes a "plain" query like the following does the job:
WITH x AS (SELECT '2012-01-01'::date AS _from
,'2012-12-01'::date As _to) -- provide date range once in CTE
SELECT u.id
,to_char(m.mon, 'MM.YYYY') AS month_year
,g.amount_paid AS grocery_amount_paid
,f.amount_paid AS fishmarket_amount_paid
FROM users u
CROSS JOIN (SELECT generate_series(_from, _to, '1 month') AS mon FROM x) m
LEFT JOIN (
SELECT user_id
,date_trunc('month', date) AS mon
,sum(amount_paid) AS amount_paid
FROM x, grocery -- CROSS JOIN with a single row
WHERE date >= _from
AND date < (_to + interval '1 month')
GROUP BY 1,2
) g ON g.user_id = u.id AND m.mon = g.mon
LEFT JOIN (
SELECT user_id
,date_trunc('month', date) AS mon
,sum(amount_paid) AS amount_paid
FROM x, fishmarket
WHERE date >= _from
AND date < (_to + interval '1 month')
GROUP BY 1,2
) f ON f.user_id = u.id AND m.mon = g.mon
ORDER BY u.id, m.mon;
produces this output:
id | month_year | grocery_amount_paid | fishmarket_amount_paid
---+------------+---------------------+------------------------
1 | 01.2012 | 10 | NULL
1 | 02.2012 | NULL | 65
1 | 03.2012 | 98 | 13
...
2 | 02.2012 | 40 | 71
2 | 02.2012 | NULL | NULL
Major points
The first CTE is for convenience only. So you have to type your date range once only. You can use any date range - as long as it's dates with the first of the month (rest of the month will be included!). You could add date_trunc() to it, but I guess you can keep the urge to use invalid dates in check.
First CROSS JOIN users to the result of generate_series() (m) which provides one row per month in your date range. You have learned in your last question how that results in multiple rows per user.
The two subqueries are identical twins. Use WHERE clauses that operate on the base column, so it can utilize an index - which you should have if your table runs over many years (no use for only one or two years, a sequential scan will be faster):
CREATE INDEX grocery_date ON grocery (date);
Then reduce all dates to the first of the month with date_trunc() and sum amount_paid per user_id and the resulting mon.
LEFT JOIN the result to the base table, again by user_id and the resulting mon. This way, rows are neither multiplied nor dropped. You get one row per user_id and month. Voilá.
BTW, I'd never use a column name id. Call it user_id in the table users as well.