Invalid identifier error when using RIGHT JOIN inside FROM clause - sql

I want to use within a FROM a subset of 2 tables using RIGHT JOIN (I want from that subset all the rows of ITV2_VEHICULOS whose ID is not in ITV2_HIST_VEHICULOS) so that the SELECT "takes" the data from there and with the WHERE it can filter
My query:
SELECT
*
FROM
ITV2_INSPECCIONES I,
ITV2_HORAS_INSPECCION HI_FIN,
ITV2_INSPECCIONES I_SIG,
ITV2_HORAS_INSPECCION HI_SIG_INI,
ITV2_HIST_VEHICULOS VH,
ITV2_CATEGORIAS_VEHICULO CAT,
ITV2_CLASIF_VEH_CONS CVC,
ITV2_CLASIF_VEH_USO CVU,
(
SELECT
*
FROM
ITV2_HIST_VEHICULOS VH
RIGHT JOIN ITV2_VEHICULOS V ON
VH.C_VEHICULO_ID = V.C_VEHICULO_ID
) VI
WHERE
I.C_TIPO_INSPECCION = 1
AND I.F_DESFAVORABLE IS NOT NULL
AND I.C_RESULTADO IN(
3,
4
)
AND I.C_VEHICULO_ID = VI.C_VEHICULO_ID
AND VI.C_CATEGORIA_ID = CAT.C_CATEGORIA_ID
AND VI.C_CLASIF_VEH_CONS_ID = CVC.C_CLASIF_VEH_CONS_ID
AND VI.C_CLASIF_VEH_USO_ID = CVU.C_CLASIF_VEH_USO_ID -- HORAS
AND I.C_ESTACION_ID = HI_FIN.C_ESTACION_ID
AND I.C_INSPECCION_ID = HI_FIN.C_INSPECCION_ID
AND I.N_ANNO = HI_FIN.N_ANNO
AND HI_FIN.C_TIPO_HORA_ID = 6 -- INSPECCION SIGUIENTE
AND I.C_ESTACION_ID = I_SIG.C_ESTACION_ID_FASE_ANT
AND I.C_INSPECCION_ID = I_SIG.C_INSPECCION_ID_FASE_ANT
AND I.N_ANNO = I_SIG.N_ANNO_FASE_ANT --
AND I_SIG.N_ANNO IN(
2013,
2014,
2015,
2016,
2017,
2018
)
AND I_SIG.C_ESTACION_ID IN(
3,
21,
22,
26,
28,
32,
34,
37,
41,
47,
53,
59,
60
)
AND I_SIG.F_INSPECCION >= '01/09/2015'
AND I_SIG.F_INSPECCION <= '30/09/2018' --
AND I_SIG.F_DESFAVORABLE IS NULL
AND I_SIG.C_RESULTADO IN(
1,
2
) -- Y HORAS
AND I_SIG.C_ESTACION_ID = HI_SIG_INI.C_ESTACION_ID
AND I_SIG.C_INSPECCION_ID = HI_SIG_INI.C_INSPECCION_ID
AND I_SIG.N_ANNO = HI_SIG_INI.N_ANNO
AND HI_SIG_INI.C_TIPO_HORA_ID = 1
--GROUP BY...
I expect in the output:
C_ESTACION_ID(FROM I) |C_VEHICULO_ID(FROM(I) |C_TIPO_HORA_ID(FROM HI_FIN)|F_HORA (FROM I_FIN) |A_MATRICULA FROM (V) | F_CAMBIO FROM (VH -> IF subdata of V EXISTS)
---------------------|----------------------|---------------------------|--------------------|---------------------|---------------------------------------

This is what your query would look like if you use "explicit join syntax" instead of just some commas between table names:
SELECT *
FROM ITV2_INSPECCIONES I
INNER JOIN ITV2_HORAS_INSPECCION HI_FIN ON I.C_ESTACION_ID = HI_FIN.C_ESTACION_ID
AND I.C_INSPECCION_ID = HI_FIN.C_INSPECCION_ID
AND I.N_ANNO = HI_FIN.N_ANNO
INNER JOIN ITV2_INSPECCIONES I_SIG ON I.C_ESTACION_ID = I_SIG.C_ESTACION_ID_FASE_ANT
AND I.C_INSPECCION_ID = I_SIG.C_INSPECCION_ID_FASE_ANT
AND I.N_ANNO = I_SIG.N_ANNO_FASE_ANT
INNER JOIN ITV2_HORAS_INSPECCION HI_SIG_INI ON I_SIG.C_ESTACION_ID = HI_SIG_INI.C_ESTACION_ID
AND I_SIG.C_INSPECCION_ID = HI_SIG_INI.C_INSPECCION_ID
AND I_SIG.N_ANNO = HI_SIG_INI.N_ANNO
WHERE I.C_TIPO_INSPECCION = 1
AND I.F_DESFAVORABLE IS NOT NULL
AND I.C_RESULTADO IN (3, 4)
AND HI_FIN.C_TIPO_HORA_ID = 6 -- INSPECCION SIGUIENTE
AND HI_SIG_INI.C_TIPO_HORA_ID = 1
AND I_SIG.F_INSPECCION >= '01/09/2015'
AND I_SIG.F_INSPECCION <= '30/09/2018'
AND I_SIG.F_DESFAVORABLE IS NULL
AND I_SIG.N_ANNO IN (2013, 2014, 2015, 2016, 2017, 2018)
AND I_SIG.C_ESTACION_ID IN (3, 21, 22, 26, 28, 32, 34, 37, 41, 47, 53, 59, 60)
AND I_SIG.C_RESULTADO IN (1, 2) -- Y HORAS
Now I had to pull out several tables and the subquery from that because, frankly, they don't make much sense to me:
ITV2_HIST_VEHICULOS VH, << no join conditions to preceding tables
ITV2_CATEGORIAS_VEHICULO CAT, << no join conditions to preceding tables
ITV2_CLASIF_VEH_CONS CVC, << no join conditions to preceding tables
ITV2_CLASIF_VEH_USO CVU, << no join conditions to preceding tables
(
SELECT
*
FROM ITV2_VEHICULOS V
LEFT JOIN ITV2_HIST_VEHICULOS VH ON
VH.C_VEHICULO_ID = V.C_VEHICULO_ID
) VI
AND I.C_VEHICULO_ID = VI.C_VEHICULO_ID
AND VI.C_CATEGORIA_ID = CAT.C_CATEGORIA_ID
AND VI.C_CLASIF_VEH_CONS_ID = CVC.C_CLASIF_VEH_CONS_ID
AND VI.C_CLASIF_VEH_USO_ID = CVU.C_CLASIF_VEH_USO_ID

Related

How much unique data is there, put it all in a table

I would like to query in SQL how many unique values ​​there are and how many rows are there. In Python, I could do it like this. But how do I do this in SQL so that I get the result like at the bottom?
In Python I could do the following
d = {'sellerid': [1, 1, 1, 2, 2, 3, 3, 3], 'modelnumber': [85, 45, 85, 12 ,85, 74, 85, 12]
, 'modelgroup': [2, 3, 2, 1, 2, 3, 2, 1 ]}
df = pd.DataFrame(data=d)
display(df.head(10))
df['Dataframe']='df'
unique_sellerid = df['sellerid'].nunique()
print("unique_sellerid", unique_sellerid)
unique_modelnumber = df['modelnumber'].nunique()
print("unique_modelnumber", unique_modelnumber)
unique_modelgroup = df['modelgroup'].nunique()
print("unique_modelgroup", unique_modelgroup)
total_rows = df.shape[0]
print("total_rows", total_rows)
[OUT]
unique_sellerid 3
unique_modelnumber 4
unique_modelgroup 3
total_rows 8
I want a query like
Here is the dummy table
CREATE TABLE cars (
sellerid INT NOT NULL,
modelnumber INT NOT NULL,
modelgroup INT,
);
INSERT INTO cars
(sellerid , modelnumber, modelgroup )
VALUES
(1, 85, 2),
(1, 45, 3),
(1, 85, 2),
(2, 12, 1),
(2, 85, 2),
(3, 74, 3),
(3, 85, 2),
(3, 12, 1);
You could use the count(distinct column) aggregation function like :
select
count(distinct col1) as nunique_col1,
count(distinct col2) as nunique_col2,
count(1) as nb_rows
from database
Also in pandas, you can also apply the nunique() function on the dataset, rather than doing it on each column: df.nunique()

Using fb-prophet Package to Predict By Group with Additional Regressors in R

prophet users of the world, hope all is well. I'm having some difficulties with a particular use case that I'll try to illustrate using some sample data and code below. First let's generate some sample data so that it will be a little bit easier to know what I am talking about.
library(data.table)
library(prophet)
library(dplyr)
# one year of months to be used for generating predictions
ds = c('2016-01-01', '2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-01','2016-07-01','2016-08-01','2016-09-01','2016-10-01','2016-11-01','2016-12-01' )
# historical customer counts
y = c (78498,12356,93732,5556,410,10296,9779,744,16407,100484,23954,141398,10575,850,16334,17496,1643,28074,93181,
18770,129968,11590,850,16738,17510,1376,27931,94369,18444,134850,13386,919,19075,18050,1565,31296,112094,27995,
167094,13402,1422,22766,20072,2340,37863,87346,16180,119863,7691,725,16931,12163,1241,25872,87455,16322,116390,
6994,620,13524,11059,990,22188,105473,23652,154145,13520,1008,18857,19209,1632,31105,102252,21284,138779,11670,
918,16078,16679,1257,26755,115033,22415,139835,13965,936,18027,18642,1407,28622,155371,40556,174321,25119,1859,
35326,28844,2962,51582,108817,19158,109864,8693,756,14358,13390,1091,21419)
# the segment channels of the customers
segment_channel = c('Existing_Omni', 'Existing_Retail', 'Existing_Direct', 'NTB_Omni', 'NTB_Retail', 'NTB_Direct', 'React_Omni', 'React_Retail', 'React_Direct')
# an external regressor to be added to the model (in my data there are like 40 of these regressor variables that I would like too add)
flash_sale = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3)
fake_data = merge(ds,segment_channel, all.y=TRUE)
setnames(fake_data, 'x', 'ds')
setnames(fake_data, 'y', 'segment_channel')
nrow(fake_data) # should be 108 rows, the 9 customer segements for each of the months in 2016
# next join the known customer counts, let's say we have them for the first 8 months of the year
fake_data = cbind(fake_data, y)
fake_data = cbind(fake_data, flash_sale)
# set some of the y values to NA so we can pretend we are trying to predict them using the ds time series as well as the flash sale values,
# which will be known in advance
fake_data = as.data.table(fake_data)
fake_data$ds = as.Date(fake_data$ds)
fake_data[, y := ifelse(ds >= '2016-08-01', NA, y)]
This code will generate a data set fairly similar to what I am working with for my problem, so hopefully you may be able to reproduce what I am doing. There are essentially two things I would like to be able to do with this data. The first is fairly straight forward, I want to be able to obviously add a regressor (like flash_sale in this example to the prophet model that I create. I can do this fairly easily like so:
christ <- tibble(
holiday = 'christ',
ds = as.Date(c('2016-11-01', '2017-11-01', '2018-11-01',
'2019-11-01')),
lower_window = 0,
upper_window = 1
)
nye <- tibble(
holiday = 'nye',
ds = as.Date(c('2016-11-01', '2017-12-01', '2018-11-01',
'2019-11-01')),
lower_window = 0,
upper_window = 1
)
holidays <- bind_rows(nye, christ)
m <- prophet(holidays = holidays)
m<- add_regressor(m, name = "flash_sale")
m <- fit.prophet(m, fake_data)
forecast <- predict(m, fake_data)
prophet_plot_components(m, forecast)
This should generate a fairly ugly plot but it's pretty easy to see that given the data this should be able to do the trick, and I could add multiple lines to add additional regressors. Ok, so we're all good so far. But the other issue is that I have 9 segment channels that I'm dealing with, and I don't want to build a separate model for each of them. Luckily I found a pretty good link on stack overflow that accomplishes the grouped prophet prediction: Using Prophet Package to Predict By Group in Dataframe in R
fcst = fake_data %>%
group_by(segment_channel) %>%
do(predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034), make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
dplyr::select(ds, segment_channel, yhat)
fcst
> fcst
# A tibble: 207 x 3
# Groups: segment_channel [9]
ds segment_channel yhat
<dttm> <fct> <dbl>
1 2016-01-01 00:00:00 Existing_Direct 38712.
2 2016-02-01 00:00:00 Existing_Direct 40321.
3 2016-03-01 00:00:00 Existing_Direct 42648.
4 2016-04-01 00:00:00 Existing_Direct 45130.
5 2016-05-01 00:00:00 Existing_Direct 46580.
6 2016-06-01 00:00:00 Existing_Direct 49437.
7 2016-07-01 00:00:00 Existing_Direct 50651.
8 2016-08-01 00:00:00 Existing_Direct 52685.
9 2016-09-01 00:00:00 Existing_Direct 54719.
10 2016-10-01 00:00:00 Existing_Direct 56687.
# ... with 197 more rows
This is more or less exactly what I want! Cool. So now all I have to do is figure out how to get my grouped predictions and my regressors added all in one step. I know I can have multi-line statements inside of do, so this is what I tried in order to get this to work:
> fcst = fake_data %>%
+ group_by(segment_channel) %>%
+ do(
+ predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+ add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+ fit.prophet(prophet(. , holidays = holidays)),
+ make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
+ dplyr::select(ds, segment_channel, yhat)
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 4
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 4
Error in add_regressor(prophet(., holidays = holidays), name = "flash_sale") :
Regressors must be added prior to model fitting.
Darn. Looks like it was running but then something about how I tried to add the regressor wasn't kosher. Next I it tried this way:
> fcst = fake_data %>%
+ group_by(segment_channel) %>%
+ do(
+ prophet(holidays = holidays),
+ add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+ fit.prophet(prophet(. , holidays = holidays)),
+ predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+ make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
+ dplyr::select(ds, segment_channel, yhat)
Error: Can only supply one unnamed argument, not 4
Call `rlang::last_error()` to see a backtrace
> fcst = fake_data %>%
+ group_by(segment_channel) %>%
+ do(
+ add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+ fit.prophet(prophet(. , holidays = holidays)),
+ predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+ make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
+ dplyr::select(ds, segment_channel, yhat)
Error: Can only supply one unnamed argument, not 3
Call `rlang::last_error()` to see a backtrace
I'm super confused at this point so I'm just hoping something out on the interwebs might know just the right incantation I need to get where I'm going.

Oracle SQL error: maximum number of expressions

Could you please help me regarding that issue getting error in Oracle SQL
ORA-01795 maximum number of expressions in a list is 1000
I'm passing value like
and test in (1, 2, 3.....1000)
Try to split your query with multiple in clauses like below
SELECT *
FROM table_name
WHERE test IN (1,2,3,....500)
OR test IN (501, 502, ......1000);
You can try workarounds:
Split single in into several ones:
select ...
from ...
where test in (1, ..., 999) or
test in (1000, ..., 1999) or
...
test in (9000, ..., 9999)
Put values into a (temporary?) table, say TestTable:
select ...
from ...
where test in (select TestField
from TestTable)
Edit: As I can see, the main difficulty is to build such a query. Let's implement it in C#. We are given a collection of ids:
// Test case ids are in [1..43] range
IEnumerable<int> Ids = Enumerable.Range(1, 43);
// Test case: 7, in actual Oracle query you, probably set it to 100 or 1000
int chunkSize = 7;
string fieldName = "test";
string filterText = string.Join(" or " + Environment.NewLine, Ids
.Select((value, index) => new {
value = value,
index = index
})
.GroupBy(item => item.index / chunkSize)
.Select(chunk =>
$"{fieldName} in ({string.Join(", ", chunk.Select(item => item.value))})"));
if (!string.IsNullOrEmpty(filterText))
filterText = $"and \r\n({filterText})";
string sql =
$#"select MyField
from MyTable
where (1 = 1) {filterText}";
Test:
Console.Write(sql);
Outcome:
select MyField
from MyTable
where (1 = 1) and
(test in (1, 2, 3, 4, 5, 6, 7) or
test in (8, 9, 10, 11, 12, 13, 14) or
test in (15, 16, 17, 18, 19, 20, 21) or
test in (22, 23, 24, 25, 26, 27, 28) or
test in (29, 30, 31, 32, 33, 34, 35) or
test in (36, 37, 38, 39, 40, 41, 42) or
test in (43))

count null values and group results

Problem
i want to count null values and group the results but it gives me wrong values
String jpql = "select c.commande.user.login, (select count(*)
from Designation c
WHERE c.commande.commandeTms IS NOT EMPTY
AND c.etatComde = 0) AS count1,
(select count(*)
from Designation c WHERE c.commande.commandeTms IS EMPTY ) AS count2
from Designation c GROUP BY c.commande.user.login";
I have got these results :
user1 user2
10 10
0 0
But I should have these ones :
user1 user2
4 2
3 1
Sample data:
table Commande
idComdeComm, commandeTms_idComndeTms, user_idUser
6, 17 2
8, NULL 2
10, 28 2
12, NULL 2
14, NULL 2
16, NULL 2
21, NULL 19
23, NULL 19
25, 26 19
31 NULL 19
table designation
idDesignation, designation, etatComde, commande_idComdeComm
5, 'fef', 0, 6
7, 'ferf', 0, 8
9, 'hrhrh', 0, 10
11, 'ujujuju', 0, 12
13, 'kikolol', 0, 14
15, 'ololo', 0, 16
20, 'gdfgfd', 0, 21
22, 'gdfgfdd', 0, 23
24, 'nhfn', 0, 25
30, 'momo', 0, 31
table user
idUser, login, password, profil
1, 'moez', '***', 'admin'
2, 'user1', '**', 'comm'
3, 'log', '**', 'log'
4, 'mo', '*', 'comm'
19, 'user2', '*', 'comm'
table Commande TMS
idComndeTms, etatOperationNumCMD, numeroComndeTms
17, '', 3131
26, '', 2525
28, '', 3333

How can I accurately display numpy.timedelta64?

As the following code makes apparent, the representation of numpy.datetime64 falls victim to overflow well before the objects fail to work.
import numpy as np
import datetime
def showMeDifference( t1, t2 ):
dt = t2-t1
dt64_ms = np.array( [ dt ], dtype = "timedelta64[ms]" )[0]
dt64_us = np.array( [ dt ], dtype = "timedelta64[us]" )[0]
dt64_ns = np.array( [ dt ], dtype = "timedelta64[ns]" )[0]
assert( dt64_ms / dt64_ns == 1.0 )
assert( dt64_us / dt64_ms == 1.0 )
assert( dt64_ms / dt64_us == 1.0 )
print str( dt64_ms )
print str( dt64_us )
print str( dt64_ns )
t1 = datetime.datetime( 2014, 4, 1, 12, 0, 0 )
t2 = datetime.datetime( 2014, 4, 1, 12, 0, 1 )
showMeDifference( t1, t2 )
t1 = datetime.datetime( 2014, 4, 1, 12, 0, 0 )
t2 = datetime.datetime( 2014, 4, 1, 12, 1, 0 )
showMeDifference( t1, t2 )
t1 = datetime.datetime( 2014, 4, 1, 12, 0, 0 )
t2 = datetime.datetime( 2014, 4, 1, 13, 0, 0 )
showMeDifference( t1, t2 )
print "These are for " + np.__version__
1000 milliseconds
1000000 microseconds
1000000000 nanoseconds
60000 milliseconds
60000000 microseconds
-129542144 nanoseconds
3600000 milliseconds
-694967296 microseconds
817405952 nanoseconds
These are for 1.7.1
Is this just a bug in np.timedelta64? If so, what idiom / workarounds have people used when working with np.timedelta64?