I would like to query in SQL how many unique values there are and how many rows are there. In Python, I could do it like this. But how do I do this in SQL so that I get the result like at the bottom?
In Python I could do the following
d = {'sellerid': [1, 1, 1, 2, 2, 3, 3, 3], 'modelnumber': [85, 45, 85, 12 ,85, 74, 85, 12]
, 'modelgroup': [2, 3, 2, 1, 2, 3, 2, 1 ]}
df = pd.DataFrame(data=d)
display(df.head(10))
df['Dataframe']='df'
unique_sellerid = df['sellerid'].nunique()
print("unique_sellerid", unique_sellerid)
unique_modelnumber = df['modelnumber'].nunique()
print("unique_modelnumber", unique_modelnumber)
unique_modelgroup = df['modelgroup'].nunique()
print("unique_modelgroup", unique_modelgroup)
total_rows = df.shape[0]
print("total_rows", total_rows)
[OUT]
unique_sellerid 3
unique_modelnumber 4
unique_modelgroup 3
total_rows 8
I want a query like
Here is the dummy table
CREATE TABLE cars (
sellerid INT NOT NULL,
modelnumber INT NOT NULL,
modelgroup INT,
);
INSERT INTO cars
(sellerid , modelnumber, modelgroup )
VALUES
(1, 85, 2),
(1, 45, 3),
(1, 85, 2),
(2, 12, 1),
(2, 85, 2),
(3, 74, 3),
(3, 85, 2),
(3, 12, 1);
You could use the count(distinct column) aggregation function like :
select
count(distinct col1) as nunique_col1,
count(distinct col2) as nunique_col2,
count(1) as nb_rows
from database
Also in pandas, you can also apply the nunique() function on the dataset, rather than doing it on each column: df.nunique()
prophet users of the world, hope all is well. I'm having some difficulties with a particular use case that I'll try to illustrate using some sample data and code below. First let's generate some sample data so that it will be a little bit easier to know what I am talking about.
library(data.table)
library(prophet)
library(dplyr)
# one year of months to be used for generating predictions
ds = c('2016-01-01', '2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-01','2016-07-01','2016-08-01','2016-09-01','2016-10-01','2016-11-01','2016-12-01' )
# historical customer counts
y = c (78498,12356,93732,5556,410,10296,9779,744,16407,100484,23954,141398,10575,850,16334,17496,1643,28074,93181,
18770,129968,11590,850,16738,17510,1376,27931,94369,18444,134850,13386,919,19075,18050,1565,31296,112094,27995,
167094,13402,1422,22766,20072,2340,37863,87346,16180,119863,7691,725,16931,12163,1241,25872,87455,16322,116390,
6994,620,13524,11059,990,22188,105473,23652,154145,13520,1008,18857,19209,1632,31105,102252,21284,138779,11670,
918,16078,16679,1257,26755,115033,22415,139835,13965,936,18027,18642,1407,28622,155371,40556,174321,25119,1859,
35326,28844,2962,51582,108817,19158,109864,8693,756,14358,13390,1091,21419)
# the segment channels of the customers
segment_channel = c('Existing_Omni', 'Existing_Retail', 'Existing_Direct', 'NTB_Omni', 'NTB_Retail', 'NTB_Direct', 'React_Omni', 'React_Retail', 'React_Direct')
# an external regressor to be added to the model (in my data there are like 40 of these regressor variables that I would like too add)
flash_sale = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3)
fake_data = merge(ds,segment_channel, all.y=TRUE)
setnames(fake_data, 'x', 'ds')
setnames(fake_data, 'y', 'segment_channel')
nrow(fake_data) # should be 108 rows, the 9 customer segements for each of the months in 2016
# next join the known customer counts, let's say we have them for the first 8 months of the year
fake_data = cbind(fake_data, y)
fake_data = cbind(fake_data, flash_sale)
# set some of the y values to NA so we can pretend we are trying to predict them using the ds time series as well as the flash sale values,
# which will be known in advance
fake_data = as.data.table(fake_data)
fake_data$ds = as.Date(fake_data$ds)
fake_data[, y := ifelse(ds >= '2016-08-01', NA, y)]
This code will generate a data set fairly similar to what I am working with for my problem, so hopefully you may be able to reproduce what I am doing. There are essentially two things I would like to be able to do with this data. The first is fairly straight forward, I want to be able to obviously add a regressor (like flash_sale in this example to the prophet model that I create. I can do this fairly easily like so:
christ <- tibble(
holiday = 'christ',
ds = as.Date(c('2016-11-01', '2017-11-01', '2018-11-01',
'2019-11-01')),
lower_window = 0,
upper_window = 1
)
nye <- tibble(
holiday = 'nye',
ds = as.Date(c('2016-11-01', '2017-12-01', '2018-11-01',
'2019-11-01')),
lower_window = 0,
upper_window = 1
)
holidays <- bind_rows(nye, christ)
m <- prophet(holidays = holidays)
m<- add_regressor(m, name = "flash_sale")
m <- fit.prophet(m, fake_data)
forecast <- predict(m, fake_data)
prophet_plot_components(m, forecast)
This should generate a fairly ugly plot but it's pretty easy to see that given the data this should be able to do the trick, and I could add multiple lines to add additional regressors. Ok, so we're all good so far. But the other issue is that I have 9 segment channels that I'm dealing with, and I don't want to build a separate model for each of them. Luckily I found a pretty good link on stack overflow that accomplishes the grouped prophet prediction: Using Prophet Package to Predict By Group in Dataframe in R
fcst = fake_data %>%
group_by(segment_channel) %>%
do(predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034), make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
dplyr::select(ds, segment_channel, yhat)
fcst
> fcst
# A tibble: 207 x 3
# Groups: segment_channel [9]
ds segment_channel yhat
<dttm> <fct> <dbl>
1 2016-01-01 00:00:00 Existing_Direct 38712.
2 2016-02-01 00:00:00 Existing_Direct 40321.
3 2016-03-01 00:00:00 Existing_Direct 42648.
4 2016-04-01 00:00:00 Existing_Direct 45130.
5 2016-05-01 00:00:00 Existing_Direct 46580.
6 2016-06-01 00:00:00 Existing_Direct 49437.
7 2016-07-01 00:00:00 Existing_Direct 50651.
8 2016-08-01 00:00:00 Existing_Direct 52685.
9 2016-09-01 00:00:00 Existing_Direct 54719.
10 2016-10-01 00:00:00 Existing_Direct 56687.
# ... with 197 more rows
This is more or less exactly what I want! Cool. So now all I have to do is figure out how to get my grouped predictions and my regressors added all in one step. I know I can have multi-line statements inside of do, so this is what I tried in order to get this to work:
> fcst = fake_data %>%
+ group_by(segment_channel) %>%
+ do(
+ predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+ add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+ fit.prophet(prophet(. , holidays = holidays)),
+ make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
+ dplyr::select(ds, segment_channel, yhat)
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 4
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 4
Error in add_regressor(prophet(., holidays = holidays), name = "flash_sale") :
Regressors must be added prior to model fitting.
Darn. Looks like it was running but then something about how I tried to add the regressor wasn't kosher. Next I it tried this way:
> fcst = fake_data %>%
+ group_by(segment_channel) %>%
+ do(
+ prophet(holidays = holidays),
+ add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+ fit.prophet(prophet(. , holidays = holidays)),
+ predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+ make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
+ dplyr::select(ds, segment_channel, yhat)
Error: Can only supply one unnamed argument, not 4
Call `rlang::last_error()` to see a backtrace
> fcst = fake_data %>%
+ group_by(segment_channel) %>%
+ do(
+ add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+ fit.prophet(prophet(. , holidays = holidays)),
+ predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+ make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
+ dplyr::select(ds, segment_channel, yhat)
Error: Can only supply one unnamed argument, not 3
Call `rlang::last_error()` to see a backtrace
I'm super confused at this point so I'm just hoping something out on the interwebs might know just the right incantation I need to get where I'm going.
Could you please help me regarding that issue getting error in Oracle SQL
ORA-01795 maximum number of expressions in a list is 1000
I'm passing value like
and test in (1, 2, 3.....1000)
Try to split your query with multiple in clauses like below
SELECT *
FROM table_name
WHERE test IN (1,2,3,....500)
OR test IN (501, 502, ......1000);
You can try workarounds:
Split single in into several ones:
select ...
from ...
where test in (1, ..., 999) or
test in (1000, ..., 1999) or
...
test in (9000, ..., 9999)
Put values into a (temporary?) table, say TestTable:
select ...
from ...
where test in (select TestField
from TestTable)
Edit: As I can see, the main difficulty is to build such a query. Let's implement it in C#. We are given a collection of ids:
// Test case ids are in [1..43] range
IEnumerable<int> Ids = Enumerable.Range(1, 43);
// Test case: 7, in actual Oracle query you, probably set it to 100 or 1000
int chunkSize = 7;
string fieldName = "test";
string filterText = string.Join(" or " + Environment.NewLine, Ids
.Select((value, index) => new {
value = value,
index = index
})
.GroupBy(item => item.index / chunkSize)
.Select(chunk =>
$"{fieldName} in ({string.Join(", ", chunk.Select(item => item.value))})"));
if (!string.IsNullOrEmpty(filterText))
filterText = $"and \r\n({filterText})";
string sql =
$#"select MyField
from MyTable
where (1 = 1) {filterText}";
Test:
Console.Write(sql);
Outcome:
select MyField
from MyTable
where (1 = 1) and
(test in (1, 2, 3, 4, 5, 6, 7) or
test in (8, 9, 10, 11, 12, 13, 14) or
test in (15, 16, 17, 18, 19, 20, 21) or
test in (22, 23, 24, 25, 26, 27, 28) or
test in (29, 30, 31, 32, 33, 34, 35) or
test in (36, 37, 38, 39, 40, 41, 42) or
test in (43))
Problem
i want to count null values and group the results but it gives me wrong values
String jpql = "select c.commande.user.login, (select count(*)
from Designation c
WHERE c.commande.commandeTms IS NOT EMPTY
AND c.etatComde = 0) AS count1,
(select count(*)
from Designation c WHERE c.commande.commandeTms IS EMPTY ) AS count2
from Designation c GROUP BY c.commande.user.login";
I have got these results :
user1 user2
10 10
0 0
But I should have these ones :
user1 user2
4 2
3 1
Sample data:
table Commande
idComdeComm, commandeTms_idComndeTms, user_idUser
6, 17 2
8, NULL 2
10, 28 2
12, NULL 2
14, NULL 2
16, NULL 2
21, NULL 19
23, NULL 19
25, 26 19
31 NULL 19
table designation
idDesignation, designation, etatComde, commande_idComdeComm
5, 'fef', 0, 6
7, 'ferf', 0, 8
9, 'hrhrh', 0, 10
11, 'ujujuju', 0, 12
13, 'kikolol', 0, 14
15, 'ololo', 0, 16
20, 'gdfgfd', 0, 21
22, 'gdfgfdd', 0, 23
24, 'nhfn', 0, 25
30, 'momo', 0, 31
table user
idUser, login, password, profil
1, 'moez', '***', 'admin'
2, 'user1', '**', 'comm'
3, 'log', '**', 'log'
4, 'mo', '*', 'comm'
19, 'user2', '*', 'comm'
table Commande TMS
idComndeTms, etatOperationNumCMD, numeroComndeTms
17, '', 3131
26, '', 2525
28, '', 3333