SQL - Create number of categories based on pre-defined number of splits - sql

I am using BigQuery, and trying to assign categorical values to each of my records, based on the number of 'splits' assigned to it.
The table has a cumulative count of records, grouped at the STR level - i.e., if there are 4 SKUs at 2 STR, the SKUs will be labeled 1,2,3,4. Each STR is assigned a SPLIT value, so if the STR has a SPLIT value of 2, I want it to split its SKUs into 2 categories. I want to create another column that would assign SKUs labeled 1-2 as '1', and SKUs labeled 3-4 as '2'. (The actual data is on a much larger scale, but thought this would be easier.)
+-----+------+---------------+--------+
| STR | SKU | SKU_ROW_COUNT | SPLITS |
+-----+------+---------------+--------+
| 1 | 1230 | 1 | 3 |
| 1 | 1231 | 2 | 3 |
| 1 | 1232 | 3 | 3 |
| 1 | 1233 | 4 | 3 |
| 1 | 1234 | 5 | 3 |
| 1 | 1235 | 6 | 3 |
| 2 | 1310 | 1 | 2 |
| 2 | 1311 | 2 | 2 |
| 2 | 1312 | 3 | 2 |
| 2 | 1313 | 4 | 2 |
| 3 | 2345 | 1 | 1 |
| 3 | 2346 | 2 | 1 |
| 3 | 2347 | 3 | 1 |
+-----+------+---------------+--------+
The SPLITS column is dynamic, ranging from 1 to 3. The number of SKUs in each category should be relatively equal, but that's not a priority as much as just the number of groups that are created. Ideally, the final table with the new column (HOST_NUMBER) would look something like this:
+-----+------+---------------+--------+-------------+
| STR | SKU | SKU_ROW_COUNT | SPLITS | HOST_NUMBER |
+-----+------+---------------+--------+-------------+
| 1 | 1230 | 1 | 3 | 1 |
| 1 | 1231 | 2 | 3 | 1 |
| 1 | 1232 | 3 | 3 | 2 |
| 1 | 1233 | 4 | 3 | 2 |
| 1 | 1234 | 5 | 3 | 3 |
| 1 | 1235 | 6 | 3 | 3 |
| 2 | 1310 | 1 | 2 | 1 |
| 2 | 1311 | 2 | 2 | 1 |
| 2 | 1312 | 3 | 2 | 2 |
| 2 | 1313 | 4 | 2 | 2 |
| 3 | 2345 | 1 | 1 | 1 |
| 3 | 2346 | 2 | 1 | 1 |
| 3 | 2347 | 3 | 1 | 1 |
+-----+------+---------------+--------+-------------+

You can use window functions and arithmetics:
select
t.*,
1 + floor((sku_row_count - 1) * splits / count(*) over(partition by str)) host_number
from mytable t
order by sku
Actually, ntile() seems to do exactly what you want - and you don't even need the sku_row_count column (which basically mimics row_number() anyway):
select
t.*,
ntile(splits) over(partition by str order by sku) host_number
from mytable t
order by sku

If the ordering of the values in the groups doesn't matter, just use modulo arithmetic:
select t.*, (SKU_ROW_COUNT % SPLITS) as split_group
from t

Below is for BigQuery Standard SQL
#standardSQL
SELECT *, 1 + MOD(SKU_ROW_COUNT, SPLITS) AS HOST_NUMBER
FROM `project.dataset.table`

Related

Postgresql: Group rows in a row and add array

Hi i have a table like this;
+----+----------+-------------+
| id | room_id | house_id |
+----+----------+-------------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 4 | 1 | 2 |
| 5 | 2 | 2 |
| 6 | 3 | 2 |
| 7 | 1 | 3 |
| 8 | 2 | 3 |
| 9 | 3 | 3 |
+----+-------+----------------+
and i want to create a view like this
+----+----------+-------------+
| id | house_id | rooms |
+----+----------+-------------+
| 1 | 1 | [1,2,3] |
| 2 | 2 | [1,2,3] |
| 3 | 3 | [1,2,3] |
+----+-------+----------------+
i tried many ways but i cant gruop them in one line
Thanks for any help.
You can use array_agg():
select house_id, array_agg(room_id order by room_id) as rooms
from t
group by house_id;
If you want the first column to be incremental, you can use row_number():
select row_number() over (order by house_id) as id, . . .

Join and Group Three Tables On Multiple Criteria - SQL

I am trying to join three separate tables based on certain criteria. Here are table examples:
TABLE A
+----+------------+----------+---------+
| id | entry num | line num | inv line|
+----+------------+----------+---------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 1 | 2 |
| 3 | 2 | 1 | 1 |
| 4 | 2 | 2 | 1 |
| 5 | 3 | 1 | 1 |
| 6 | 3 | 1 | 2 |
| 7 | 3 | 1 | 3 |
+----+------------+--------+-----------+
TABLE B
+----+------------+----------+---------+
| id | entry num | line num | code |
+----+------------+----------+---------+
| 1 | 1 | 1 | 100 |
| 2 | 2 | 1 | 370 |
| 3 | 2 | 2 | 120 |
| 4 | 3 | 1 | 300 |
+----+------------+--------+-----------+
TABLE C
+----+------------+--------+-----------+
| id | rate | amt | code |
+----+------------+--------+-----------+
| 1 | 25% | $50 | 100 |
| 2 | 50% | $20 | 370 |
| 3 | 50% | $25 | 120 |
| 4 | 30% | $150 | 300 |
+----+------------+----------+---------+
I need the final table to look like this, but I am at a loss on how to write the syntax:
FINAL TABLE
+----+------------+----------+---------+---------+---------+---------+
| id | entry num | line num | inv line| code | rate | amt |
+----+------------+----------+---------+---------+---------+---------+
| 1 | 1 | 1 | 1 | 100 | 25% | $50 |
| 2 | 1 | 1 | 2 | 100 | 25% | $50 |
| 3 | 2 | 1 | 1 | 370 | 50% | $20 |
| 4 | 2 | 2 | 1 | 120 | 50% | $25 |
| 5 | 3 | 1 | 1 | 300 | 30% | $150 |
| 6 | 3 | 1 | 2 | 300 | 30% | $150 |
| 7 | 3 | 1 | 3 | 300 | 30% | $150 |
+----+------------+----------+---------+---------+---------+---------+
Ultimately, I need table A and B joined where both entry num and line num match, but then I need to show each individual row for the inv line number.
For example, entry num 3 / line num 1 will has 3 invoice numbers. All entry num 3/ line num 1 will have the code 300, 30% rate, and $150 amount, but I need to visibly see that there are 3 invoice lines.
I've tried to join tables, group them, and get total counts, but to no avail. Thanks for your help!
I think that you need to create joins between TableA and Table B on EntryNum and LineNum, and then between TableB and TableC on Code. Your SQL should look like:
SELECT A.ID, A.EntryNum, A.LineNum, A.InvLine, B.Code, C.Rate, C.Amt
FROM TableC AS C INNER JOIN (TableB AS B INNER JOIN TableA AS A ON (B.LineNum = A.LineNum) AND (B.EntryNum = A.EntryNum))
ON C.Code = B.Code;
Which produces the result that you want:
Regards,

SQL order by but repeat crescent numbers

I'm using SQL Server 2014 and i'm having a trouble with a query.
I have this scenario bellow:
| Number | Series | Name |
|--------|--------|---------|
| 9 | 1 | Name 1 |
| 5 | 3 | Name 2 |
| 8 | 2 | Name 3 |
| 7 | 3 | Name 4 |
| 0 | 1 | Name 5 |
| 1 | 2 | Name 6 |
| 9 | 2 | Name 7 |
| 3 | 3 | Name 8 |
| 4 | 1 | Name 9 |
| 0 | 1 | Name 10 |
and I need to get it ordered by series column like this:
| Number | Series | Name |
|--------|--------|---------|
| 9 | 1 | Name 1 |
| 8 | 2 | Name 3 |
| 5 | 3 | Name 2 |
| 7 | 1 | Name 5 |
| 1 | 2 | Name 6 |
| 0 | 3 | Name 4 |
| 4 | 1 | Name 9 |
| 9 | 2 | Name 7 |
| 3 | 3 | Name 8 |
| 0 | 1 | Name 10 |
Actually is more a sequency in "series" column than an ordenation.
1,2,3 again 1,2,3...
Somebody could help me?
You can do this using the ANSI standard function row_number():
select number, series, name
from (select t.*, row_number() over (partition by series order by number) as seqnum
from t
) t
order by seqnum, series;
This assigns "1" to the first record for each series, "2" to the second, and so on. The outer order by then puts all the "1"s together, all the "2" together. This has the effect of interleaving the values of the series.

Alternation of positive and negative values

thank you for attention.
I have a table called "PROD_COST" with 5 fields
(ID,Duration,Cost,COST_NEXT,COST_CHANGE).
I need extra field called "groups" for aggregation.
Duration = number of days the price is valid (1 day=1row).
Cost = product price in this day.
Cost_next = lead(cost,1,0).
Cost_change = Cost_next - Cost.
Now i need to group by Cost_change. It can be
positive,negative or 0 values.
+----+---+------+------+------+
| 1 | 1 | 10 | 8,5 | -1,5 |
| 2 | 1 | 8,5 | 12,2 | 3,7 |
| 3 | 1 | 12,2 | 5,3 | -6,9 |
| 4 | 1 | 5,3 | 4,2 | 1,2 |
| 5 | 1 | 4,2 | 6,2 | 2 |
| 6 | 1 | 6,2 | 9,2 | 3 |
| 7 | 1 | 9,2 | 7,5 | -2,7 |
| 8 | 1 | 7,5 | 6,2 | -1,3 |
| 9 | 1 | 6,2 | 6,3 | 0,1 |
| 10 | 1 | 6,3 | 7,2 | 0,9 |
| 11 | 1 | 7,2 | 7,5 | 0,3 |
| 12 | 1 | 7,5 | 0 | 7,5 |
+----+---+------+------+------+`
I need to make a query, which will group it by first negative or positive value (+ - + - + -). Last one field is what i want.
Sorry for my English `
+----+---+------+------+------+---+
| 1 | 1 | 10 | 8,5 | -1,5 | 1 |
| 2 | 1 | 8,5 | 12,2 | 3,7 | 2 |
| 3 | 1 | 12,2 | 5,3 | -6,9 | 3 |
| 4 | 1 | 5,3 | 4,2 | 1,2 | 4 |
| 5 | 1 | 4,2 | 6,2 | 2 | 4 |
| 6 | 1 | 6,2 | 9,2 | 3 | 4 |
| 7 | 1 | 9,2 | 7,5 | -2,7 | 5 |
| 8 | 1 | 7,5 | 6,2 | -1,3 | 5 |
| 9 | 1 | 6,2 | 6,3 | 0,1 | 6 |
| 10 | 1 | 6,3 | 7,2 | 0,9 | 6 |
| 11 | 1 | 7,2 | 7,5 | 0,3 | 6 |
| 12 | 1 | 7,5 | 0 | 7,5 | 6 |
+----+---+------+------+------+---+`
If you're in SQL Server 2012 you can use the window functions to do this:
select
id, COST_CHANGE, sum(GRP) over (order by id asc) +1
from
(
select
*,
case when sign(COST_CHANGE) != sign(isnull(lag(COST_CHANGE)
over (order by id asc),COST_CHANGE)) then 1 else 0 end as GRP
from
PROD_COST
) X
Lag will get the value from previous row, check the sign of it and compare it to the current row. If the values don't match, the case will return 1. The outer select will do a running total of these numbers, and every time there is 1, it will increase the total.
It is possible to use the same logic with older versions too, you'll just have to fetch the previous row from the table using the id and do running total by re-calculating all rows before the current one.
Example in SQL Fiddle
James's answer is close but it doesn't handle the zero value correctly. This is a pretty easy modification. One tricky approximation uses differences between product changes:
select id, COST_CHANGE, sum(IsNewGroup) over (order by id asc) +1
from (select pc.*,
(case when sign(cost_change) - sign(lag(cost_change) over (order by id)) between -1 and 1
then 0
else 1 -- `NULL` intentionally goes here
end) IsNewGroup
from Prod_Cost
) pc
For clarity, here is a SQL Fiddle with zero values. From my understanding of the question, the OP only wants an actual sign change.
This may still not be correct. The OP simply is not clear about what to do about 0 values.

Ask about query in sql server

i have table like this:
| ID | id_number | a | b |
| 1 | 1 | 0 | 215 |
| 2 | 2 | 28 | 8952 |
| 3 | 3 | 10 | 2000 |
| 4 | 1 | 0 | 215 |
| 5 | 1 | 0 |10000 |
| 6 | 3 | 10 | 5000 |
| 7 | 2 | 3 |90933 |
I want to sum a*b where id_number is same, what the query to get all value for every id_number? for example the result is like this :
| ID | id_number | result |
| 1 | 1 | 0 |
| 2 | 2 | 523455 |
| 3 | 3 | 70000 |
This is a simple aggregation query:
select id_number, sum(a*b)
from t
group by id_number
I'm not sure what the first column is for.