I got a lot of help form different articles, for which i thank you a lot.
But, right now i have a case were i need your support namely related with an SQL procedure which not only is checking if data exists (for update or insert it, but i have to match it with other 2 tables to check if data is matching then insert the same row with different values for some columns on a separate table).
I hope to be more explicit, so the example below I hope to help
Main row data : db.Table1
|rowID|PurchaseDate|ProducID | ProductName | CustomerID | Qty | UnitType|
|row1 |09.09.2018 |206 | Prod1 | 1 | 10 | bl. |
|row2 |09.09.2018 |207 | Prod2 | 2 | 15 | bl. |
|row3 |12.09.2018 |203 | Prod5 | 5 | 5 | lk. |
|row4 |15.09.2018 |207 | Prod2 | 6 | 10 | lk. |
|row5 |20.09.2018 |207 | Prod2 | 8 | 3 | Pk. |
|row6 |20.09.2018 |203 | Prod5 | 8 | 6 | Pk. |
|row7 |20.09.2018 |205 | Prod0 | 2 | 5 | J. |
to match with: db.Table2
|CustomerID| CustomerName|
|1 | Customer1 |
|2 | Customer2 |
|3 | Customer3 |
|4 | Customer4 |
|5 | Customer5 |
to match with: db.Table3
|ProducID| ProductName| SubProdNAME| |
|205 | Prod0 | Prod101 |
|205 | Prod0 | Prod202 |
|204 | Prod01 | Prod1001 |
|204 | Prod01 | Prod2002 |
to final table: db.TableFIN
|rowID| PurchaseDate|ProducID|ProductName|CustomerID|Qty|UnitType |Stage|
|row1 | 09.09.2018 | 206 | Prod1 | 1 |10 | bl. | DONE|
|row1 | 09.09.2018 | 206 | Prod1 | 1 |10 | bl. | NONE|
|row2 | 09.09.2018 | 207 | Prod2 | 2 |15 | bl. | DONE|
|row2 | 09.09.2018 | 207 | Prod2 | 2 |15 | bl. | NONE|
|row3 | 12.09.2018 | 203 | Prod5 | 5 |5 | lk. | DONE|
|row3 | 12.09.2018 | 203 | Prod5 | 5 |5 | lk. | NONE|
|row4 | 15.09.2018 | 207 | Prod2 | 6 |10 | lk. | DONE|
|row4 | 15.09.2018 | 207 | Prod2 | 6 |0 | lk. | NONE|
|row5 | 20.09.2018 | 207 | Prod2 | 8 |3 | Pk. | DONE|
|row5 | 20.09.2018 | 207 | Prod2 | 8 |0 | Pk. | NONE|
|row6 | 20.09.2018 | 203 | Prod5 | 8 |6 | Pk. | DONE|
|row6 | 20.09.2018 | 203 | Prod5 | 8 |0 | Pk. | NONE|
|row7 | 20.09.2018 | 205 | Prod101 | 3 |5 | bundle| DONE|
|row7 | 20.09.2018 | 205 | Prod101 | 3 |5 | bundle| NONE|
|row7 | 20.09.2018 | 205 | Prod202 | 3 |5 | bundle| DONE|
|row7 | 20.09.2018 | 205 | Prod202 | 3 |5 | bundle| NONE|
So, basically what i need is to insert data by row depending on Stage, one row with stage DONE and second with NONE - plus, in case the consumerID it matches then Qty value it's equal in both cases, otherwise for NONE value = 0 and for DONE the original value.
FOR ProducID if it matches the product, then we have to insert 4 rows. as on above table. Again matching consumerID & Prod updating/inserting stage/values.
YOur support, is highly appreciated.
Thank you in advance!
Related
This is a hard question for me to word but I was wondering if something like this is possible using psql. The general idea is I have a field called "label" which can contain any TEXT value. I am trying to group these with unique IDs that increment when the value of label is different than in the previous row.
Input Table
| RID | Label|
| ---- | ---- |
| 1 | |
| 2 | A |
| 3 | A |
| 4 | |
| 5 | |
| 6 | B |
| 7 | B |
| 8 | B |
| 9 | A |
|10 | A |
|11 | |
|12 | |
Desired Output Table
|RID|Label|Group ID|
|---|-----|--------|
| 1 | | 1 |
| 2 | A | 2 |
| 3 | A | 2 |
| 4 | | 3 |
| 5 | | 3 |
| 6 | B | 4 |
| 7 | B | 4 |
| 8 | B | 4 |
| 9 | A | 5 |
|10 | A | 5 |
|11 | | 6 |
|12 | | 6 |
I want to make a cumulative count using Hive SQL in recorrencia column according to the other ones.
+------------+---------+-------+--------------+--+
| t.ano_mes | t.site | t.uf | recorrencia |
+------------+---------+-------+--------------+--+
| 202001 | 174 | AM | 1 |
| 202002 | 174 | AM | 1 |
| 202003 | 174 | AM | 1 |
| 202004 | 174 | AM | 1 |
| 202005 | 174 | AM | 1 |
| 202006 | 174 | AM | 1 |
| 202007 | 174 | AM | 1 |
| 202008 | 174 | AM | 1 |
| 202005 | 1JN | SP | 1 |
| 202006 | 1JN | SP | 1 |
| 202005 | 1LJ | SP | 1 |
| 202009 | 1LJ | SP | 1 |
| 202001 | 1RG | SP | 1 |
| 202002 | 1RG | SP | 1 |
| 202003 | 1RG | SP | 1 |
| 202004 | 1RG | SP | 1 |
| 202005 | 1RG | SP | 1 |
| 202006 | 1RG | SP | 1 |
| 202007 | 1RG | SP | 1 |
Desired output
+------------+---------+-------+--------------+--------+
| t.ano_mes | t.site | t.uf | recorrencia |cum_rec
+------------+---------+-------+--------------+--------+
| 202001 | 174 | AM | 1 |1
| 202002 | 174 | AM | 1 |2
| 202003 | 174 | AM | 1 |3
| 202004 | 174 | AM | 1 |4
| 202005 | 174 | AM | 1 |5
| 202006 | 174 | AM | 1 |6
| 202007 | 174 | AM | 1 |7
| 202008 | 174 | AM | 1 |8
| 202005 | 1JN | SP | 1 |1
| 202006 | 1JN | SP | 1 |2
| 202005 | 1LJ | SP | 1 |1
| 202009 | 1LJ | SP | 1 |2
| 202001 | 1RG | SP | 1 |1
| 202002 | 1RG | SP | 1 |2
| 202003 | 1RG | SP | 1 |3
| 202004 | 1RG | SP | 1 |4
| 202005 | 1RG | SP | 1 |5
| 202006 | 1RG | SP | 1 |6
| 202007 | 1RG | SP | 1 |7
I've tried a lot of functions like COUNT(*) OVER (t.ano_mes) and COUNT(*) OVER (t.site) but it runs the sum until the end of table, and do not restarts as the t.site changes.
As soon as t.site changes, the counter should restart.
That would be:
sum(recorrencia) over(partition by t.site order by t.ano_mes) as cum_rec
The partition by clause causes the sum to reset every time the site changes.
Note that if recorrencia is always 1, as shown in your sample data, then row_number() is sufficient:
row_number() over(partition by t.site order by t.ano_mes) as cum_rec
I have a dataset for which I have to conditionally count rows from table B that are between two dates in table A. I have to do this without the use of a correlated subquery in the SELECT clause, as this is not supported in Netezza - docs: https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.0.3/com.ibm.nz.dbu.doc/c_dbuser_correlated_subqueries_ntz_sql.html.
Background on tables: Users can log in to a site (logins). When they log in, they can take actions, which are in (actions_taken). The desired output is a count of rows that are between the actions_taken action_date and lag_action_date.
Data and attempt found here: http://rextester.com/NLDH13254
Table: actions_taken (with added calculations - see RexTester.)
| user_id | action_type | action_date | lag_action_date | elapsed_days |
|---------|---------------|-------------|-----------------|--------------|
| 12345 | action_type_1 | 6/27/2017 | 3/3/2017 | 116 |
| 12345 | action_type_1 | 3/3/2017 | 2/28/2017 | 3 |
| 12345 | action_type_1 | 2/28/2017 | NULL | NULL |
| 12345 | action_type_2 | 3/6/2017 | 3/3/2017 | 3 |
| 12345 | action_type_2 | 3/3/2017 | 3/25/2016 | 343 |
| 12345 | action_type_2 | 3/25/2016 | NULL | NULL |
| 12345 | action_type_4 | 3/6/2017 | 3/3/2017 | 3 |
| 12345 | action_type_4 | 3/3/2017 | NULL | NULL |
| 99887 | action_type_1 | 4/1/2017 | 2/11/2017 | 49 |
| 99887 | action_type_1 | 2/11/2017 | 1/28/2017 | 14 |
| 99887 | action_type_1 | 1/28/2017 | NULL | NULL |
Table: logins
| user_id | login_date |
|---------|------------|
| 12345 | 6/27/2017 |
| 12345 | 6/26/2017 |
| 12345 | 3/7/2017 |
| 12345 | 3/6/2017 |
| 12345 | 3/3/2017 |
| 12345 | 3/2/2017 |
| 12345 | 3/1/2017 |
| 12345 | 2/28/2017 |
| 12345 | 2/27/2017 |
| 12345 | 2/25/2017 |
| 12345 | 3/25/2016 |
| 12345 | 3/23/2016 |
| 12345 | 3/20/2016 |
| 99887 | 6/27/2017 |
| 99887 | 6/26/2017 |
| 99887 | 6/24/2017 |
| 99887 | 4/2/2017 |
| 99887 | 4/1/2017 |
| 99887 | 3/30/2017 |
| 99887 | 3/8/2017 |
| 99887 | 3/6/2017 |
| 99887 | 3/3/2017 |
| 99887 | 3/2/2017 |
| 99887 | 2/28/2017 |
| 99887 | 2/11/2017 |
| 99887 | 1/28/2017 |
| 99887 | 1/26/2017 |
| 99887 | 5/28/2016 |
DESIRED OUTPUT: cnt_logins_between_action_dates field
| user_id | action_type | action_date | lag_action_date | elapsed_days | cnt_logins_between_action_dates |
|---------|---------------|-------------|-----------------|--------------|---------------------------------|
| 12345 | action_type_1 | 6/27/2017 | 3/3/2017 | 116 | 5 |
| 12345 | action_type_1 | 3/3/2017 | 2/28/2017 | 3 | 4 |
| 12345 | action_type_1 | 2/28/2017 | NULL | NULL | 1 |
| 12345 | action_type_2 | 3/6/2017 | 3/3/2017 | 3 | 2 |
| 12345 | action_type_2 | 3/3/2017 | 3/25/2016 | 343 | 7 |
| 12345 | action_type_2 | 3/25/2016 | NULL | NULL | 1 |
| 12345 | action_type_4 | 3/6/2017 | 3/3/2017 | 3 | 2 |
| 12345 | action_type_4 | 3/3/2017 | NULL | NULL | 1 |
| 99887 | action_type_1 | 4/1/2017 | 2/11/2017 | 49 | 8 |
| 99887 | action_type_1 | 2/11/2017 | 1/28/2017 | 14 | 2 |
| 99887 | action_type_1 | 1/28/2017 | NULL | NULL | 1 |
You don't need a correlated sub-query. Get the previous date using lag and join the logins table to count the actions between dates.
with prev_dates as (select at.*
,coalesce(lag(action_date) over(partition by user_id,action_type order by action_date)
,action_date) as lag_action_date
from actions_taken at
)
select at.user_id,at.action_type,at.action_date,at.lag_action_date
,at.action_date-at.lag_action_date as elapsed_days
,count(*) as cnt
from prev_dates at
join login l on l.user_id=at.user_id and l.login_date<=at.action_date and l.login_date>=at.lag_action_date
group by at.user_id,at.action_type,at.action_date,at.lag_action_date
order by 1,2,3
So I have a Request History table that I would like to flag its versions (version is based on end of cycle); I was able to mark the end of the cycle, but somehow I couldn't update the values of each associated with each cycle. Here is an example:
|history_id | Req_id | StatID | Time |EndCycleDate |
|-------------|---------|-------|---------- |-------------|
|1 | 1 |18 | 3/26/2017 | NULL |
|2 | 1 | 19 | 3/26/2017 | NULL |
|3 | 1 |20 | 3/30/2017 | NULL |
|4 |1 | 23 |3/30/2017 | NULL |
|5 | 1 |35 |3/30/2017 | 3/30/2017 |
|6 | 1 |33 |4/4/2017 | NULL |
|7 | 1 |34 |4/4/2017 | NULL |
|8 | 1 |39 |4/4/2017 | NULL |
|9 | 1 |35 |4/4/2017 | 4/4/2017 |
|10 | 1 |33 |4/5/2017 | NULL |
|11 | 1 |34 |4/6/2017 | NULL |
|12 | 1 |39 |4/6/2017 | NULL |
|13 | 1 |35 |4/7/2017 | 4/7/2017 |
|14 | 1 |33 |4/8/2017 | NULL |
|15 | 1 | 34 |4/8/2017 | NULL |
|16 | 2 |18 |3/28/2017 | NULL |
|17 | 2 |26 |3/28/2017 | NULL |
|18 | 2 |20 |3/30/2017 | NULL |
|19 | 2 |23 |3/30/2017 | NULL |
|20 | 2 |35 |3/30/2017 | 3/30/2017 |
|21 | 2 |33 |4/12/2017 | NULL |
|22 | 2 |34 |4/12/2017 | NULL |
|23 | 2 |38 |4/13/2017 | NULL |
Now what I would like to achieve is to derive a new column, namely VER, and update its value like the following:
|history_id | Req_id | StatID | Time |EndCycleDate | VER |
|-------------|---------|-------|---------- |-------------|------|
|1 | 1 |18 | 3/26/2017 | NULL | 1 |
|2 | 1 | 19 | 3/26/2017 | NULL | 1 |
|3 | 1 |20 | 3/30/2017 | NULL | 1 |
|4 |1 | 23 |3/30/2017 | NULL | 1 |
|5 | 1 |35 |3/30/2017 | 3/30/2017 | 1 |
|6 | 1 |33 |4/4/2017 | NULL | 2 |
|7 | 1 |34 |4/4/2017 | NULL | 2 |
|8 | 1 |39 |4/4/2017 | NULL | 2 |
|9 | 1 |35 |4/4/2017 | 4/4/2017 | 2 |
|10 | 1 |33 |4/5/2017 | NULL | 3 |
|11 | 1 |34 |4/6/2017 | NULL | 3 |
|12 | 1 |39 |4/6/2017 | NULL | 3 |
|13 | 1 |35 |4/7/2017 | 4/7/2017 | 3 |
|14 | 1 |33 |4/8/2017 | NULL | 4 |
|15 | 1 | 34 |4/8/2017 | NULL | 4 |
|16 | 2 |18 |3/28/2017 | NULL | 1 |
|17 | 2 |26 |3/28/2017 | NULL | 1 |
|18 | 2 |20 |3/30/2017 | NULL | 1 |
|19 | 2 |23 |3/30/2017 | NULL | 1 |
|20 | 2 |35 |3/30/2017 | 3/30/2017 | 1 |
|21 | 2 |33 |4/12/2017 | NULL | 2 |
|22 | 2 |34 |4/12/2017 | NULL | 2 |
|23 | 2 |38 |4/13/2017 | NULL | 2 |
One method that comes really close is a cumulative count:
select t.*,
count(endCycleDate) over (partition by req_id order by history_id) as ver
from t;
However, this doesn't get the value when the endCycle date is defined exactly right. And the value starts at 0. Most of these problems are fixed with a windowing clause:
select t.*,
(count(endCycleDate) over (partition by req_id
order by history_id
rows between unbounded preceding and 1 preceding) + 1
) as ver
from t;
But that misses the value on the first row first one. So, here is a method that actually works. It enumerates the values backward and then subtracts from the total to get the versions in ascending order:
select t.*,
(1 + count(*) over (partition by req_id) -
(count(endCycleDate) over (partition by req_id
order by history_id desc)
) as ver
from t;
Actually I am stuck in one issue. I have a table:
tbl_color
+------------+
|id | name |
|---|--------|
|1 | Red |
|---|--------|
|2 | Blue |
|---|--------|
|3 | Black |
+------------+
tbl_clothes
+----------------+
|id | name |
| 1 | Pant |
| 2 | Shirt |
| 3 | T-shirt |
+----------------+
tb_sales
+---------------------------------------+
|id | id_cloth | id_color | sales_date |
|---|----------|-----------|------------|
|1 | 1 | 1 | 2016/1/1 |
|---|----------|-----------|------------|
|2 | 1 | 3 | 2016/1/1 |
|---|----------|-----------|------------|
|3 | 1 | 1 | 2016/2/2 |
+---------------------------------------+
So when I change one row of tbl_color to
tbl_color
+---------------------------+
|id | name | modified_on |
|----|--------|-------------|
|1 | Orange | 2016/3/2 |
|----|--------|-------------|
|2 | Blue | 2016/1/2 |
|----|--------|-------------|
|3 | Black | 2016/1/2 |
+---------------------------+
So when I want to get report of sales on 2016/1/1
SELECT * from table tb_sales
JOIN tbl_clothes ON tbl_clothes.id = tbl_sales.id_cloth
JOIN tbl_sales ON tbl_color.id = tbl_sales.id_color
where sales_date = '2016/1/1'
I get the report that have been modified no the original sales
How can I handle this issue?