Special group by (based on succeeding values) in SQL - sql

I would like to perform a special group by statement, in other programming languages I would use some kind of loop, but I have no idea on how to tackle this using sql. Hope that you guys can be of some help. Bit of sample data
NR date Code
2 1-1-2013 6
2 1-1-2013 6
2 3-1-2013 6
2 4-1-2013 7
2 5-1-2013 6
2 5-1-2013 5
3 1-1-2013 1
3 2-1-2013 1
3 2-1-2013 6
3 3-1-2013 7
I would like to do a group by on NR and Code. However I don't want to group non-succeeding Code's (they are sorted on NR and date). The desired output will make it clear i think:
NR Code #
2 6 3
2 7 1
2 6 1
2 5 1
3 1 2
3 6 1
3 7 1
After this I would like to cast it to this format (could be another question, but illustrates my need for a solution for the problem above):
NR Code_string #_string
2 6,7,6,5 3,1,1,1
3 1,6,7 2,1,1
If I did not provide a good example, please tell, this is my first question for sql (learned alot of R using SO)

select NR, Code, count(*) from yourdatabasename
group by NR, Code
order by NR, Code
This command should satisfy yor first desired output.

Related

Select maximum value where another column is used for for the Grouping

I'm trying to join several tables, where one of the tables is acting as a
key-value store, and then after the joins find the maximum value in a
column less than another column. As a simplified example, I have the following three tables:
Documents:
DocumentID
Filename
LatestRevision
1
D1001.SLDDRW
18
2
P5002.SLDPRT
10
Variables:
VariableID
VariableName
1
DateReleased
2
Change
3
Description
VariableValues:
DocumentID
VariableID
Revision
Value
1
2
1
Created
1
3
1
Drawing
1
2
3
Changed Dimension
1
1
4
2021-02-01
1
2
11
Corrected typos
1
1
16
2021-02-25
2
3
1
Generic part
2
3
5
Screw
2
2
4
2021-02-24
I can use the LEFT JOIN/IS NULL thing to get the latest version of
variables relatively easily (see http://sqlfiddle.com/#!7/5982d/3/0).
What I want is the latest version of variables that are less than or equal
to a revision which has a DateReleased, for example:
DocumentID
Filename
Variable
Value
VariableRev
DateReleased
ReleasedRev
1
D1001.SLDDRW
Change
Changed Dimension
3
2021-02-01
4
1
D1001.SLDDRW
Description
Drawing
1
2021-02-01
4
1
D1001.SLDDRW
Description
Drawing
1
2021-02-25
16
1
D1001.SLDDRW
Change
Corrected Typos
11
2021-02-25
16
2
P5002.SLDPRT
Description
Generic Part
1
2021-02-24
4
How do I do this?
I figured this out. Add another JOIN at the start to add in another version of the VariableValues table selecting only the DateReleased variables, then make sure that all the VariableValues Revisions selected are less than this date released. I think the LEFT JOIN has to be added after this table.
The example at http://sqlfiddle.com/#!9/bd6068/3/0 shows this better.

Check constraint for multiple conditions

The teacher gave us a team assignment, and me and my teammate are quite struggling with it (especially since we need to use things like TRIGGERS and PROCEDURES, things we didn't see in class yet …).
We need to implement an arc-relationship, and we fail to understand how …
But before I tell you guys what I need to accomplish, I will give you part of the description of the task, so you guys can understand the situation a bit better …
We basically need to make an ERD for a VLSI CAD-system and we need to implement it. Now, we have our CELL entity, the attributes of which aren't really relevant … The only thing you guys need to know in order to help us is that it has a primary key, CELL_CODE, which is a VARCHAR.
Each CELL has many (I think at least four, I don't think you can have triangular CELLS, but doesn't matter anyways) SIDES. A SIDE can be logically identified by its CELL, and to make matters ridiculously difficult, each SIDE has to be numbered by its CELL, like so:
CELLS:
CELL_CODE
1
2
SIDES:
SEQUENCE_NUMBER CELL_CODE
1 1
2 1
3 1
1 2
2 2
3 2
Now, each SIDE has its CONNECTION_PINS. CONNECTION_PINS is also uniquely identified by SIDES, which are basically numbered in a similar manner:
CELLS:
CELL_CODE
1
2
SIDES:
SEQUENCE_NUMBER CELL_CODE
1 1
2 1
3 1
1 2
2 2
3 2
CONNECTION_PINS:
SEQUENCE_NUMBER SIE_SEQUENCE_NUMBER CELL_CODE
1 1 1
2 1 1
1 2 1
2 2 1
1 3 1
2 3 1
1 1 2
2 1 2
1 2 2
2 2 2
1 3 2
2 3 2
I tried to explain the numbering issue we have here: Data model - PRIMARY KEY numbering issue, but yeah, I didn't really explain it the way it should be explained ...
Now, we have one final entity, which is where the Arc comes in: CONNECTIONS. CONNECTIONS has 2 CONNECTION_PINS: one for START_FROMand one for END_OF. Now, logically seen the start pin can't be the end pin as well, for a given connection. And that's our struggle. Basically, this shouldn't be allowed:
CELLS:
CELL_CODE
1
2
SIDES:
SEQUENCE_NUMBER CELL_CODE
1 1
2 1
3 1
1 2
2 2
3 2
CONNECTION_PINS:
SEQUENCE_NUMBER SIE_SEQUENCE_NUMBER CELL_CODE
1 1 1
2 1 1
1 2 1
2 2 1
1 3 1
2 3 1
1 1 2
2 1 2
1 2 2
2 2 2
1 3 2
2 3 2
CONNECTIONS:
(you shouldn't be able to put this in …)
CPI_SEQNUM_START SIE_SEQNUM_START CELL_CODE_START CPI_SEQNUM_END SIE_SEQNUM_END CELL_CODE_END
1 1 1 1 1 1
Now, this is basically the ERD for this part:
ERD with barred relationships and the arc-relationship in question
and this is the physical model:
Physical model
I basically thought a simple CHECK might do (CHECK (CPI_SEQNUM_START <> CPI_SEQNUM_END AND CELL_CODE_START <> CELL_CODE_END AND SIE_SEQNUM_START <> SIE_SEQNUM_END) ), but that prevented us from inserting anything somehow … Any advice?
Your approach was correct to use a CHECK constraint. Your logic for the constraint was wrong though. You need an OR condition. Only one of the three fields needs to be different.
CPI_SEQNUM_START <> CPI_SEQNUM_END OR
CELL_CODE_START <> CELL_CODE_END OR
SIE_SEQNUM_START <> SIE_SEQNUM
... assuming all three fields are not nullable.

SQL table structure for store value against list of combination

I have a requirement from client where I need to store a value against list of combination.
For example I have following LOBs and against each combination I need to store a value.
Auto
WC
Personal
I purposed multiple solutions he is not satisfied with anyone.
Solution 1: create single table, insert value against all possible combination(string) something like
LOB Value
Auto 1
WC 2
Personal 3
Auto,WC 4
Auto, personal 5
WC, Personal 6
Auto, WC, Personal 7
Solution 2: create lkp_lob, lob_group and lob_group_detail tables. Each group combination represent a group.
Lkp_lob
Lob_key Name
1 Auto
2 WC
3 Person
Lob_group (unique query constrain on lob_group_key and lob_key)
Lob_group_key Lob_key
1 1
2 2
3 3
4 1
4 2
5 1
5 3
6 2
6 3
7 1
7 2
7 3
Lob_group_detail
Lob_group_key Value
1 1
2 2
3 3
4 4
5 5
6 6
7 7
Any suggestion would be highly appreciated.
First of all I did not understood that terms you said.
But from database perspective it is always good to have multiple tables for each module. You will be facing less difficulties when doing CRUD. And will be more faster.

Count instances of number in Row

I have a sheet formated somewhat like this
Thing 5 6 7 Person 1 Person 2 Person 3
Thing 1 1 2 7 7 6
Thing 2 5 5
Thing 3 7 6 6
Thing 4 6 6 5
I am trying to find a query formula that I can place in the columns labeled 5,6,7 that will count the number of people who have that amount of Thing 1. For example, I filled out the Thing 1 row, showing that 1 person has 6 of Thing 1 and 2 people have 7 of Thing 1.
You can use this function: "COUNTIF".
The formula to write in the cells will look like this:
=COUNTIF(E2:G2;"=5")
For more information regarding this function, check the documentation: https://support.google.com/docs/answer/3093480?hl=en

How to match already-calculated means to the original data set?

I am now learning R. I feel that there is a very easy succinct answer to my problem, but I am having trouble solving it myself.
I have a large data set. One column contains various 'categories'. I aggregated these categories to get the mean for each one. So, right now, my aggregated table looks like this:
Category __ Average
A ________ a
B ________ b
C ________ c
etc...
I want now to take these average and combine it as another column onto my original data.
So, I want it to look something like this:
Categories _____ Averages
B _____________ b
A______________a
B______________b
C______________c
B______________b
C______________c
In other words, I want to match each category with its corresponding mean. I have tried variations of merge(), match(), and different apply functions. The fact that my aggregated table is so much smaller than my original data is causing some problems.
Is there a specific function I can use for this simple problem? Thanks in advance.
In base R:
data <- data.frame(Category=c(rep("A",3), rep("B",4), rep("C",2)), Value=1:9)
> data
Category Value
1 A 1
2 A 2
3 A 3
4 B 4
5 B 5
6 B 6
7 B 7
8 C 8
9 C 9
> avg <- lapply(split(data$Value, data$Category), mean)
$A
[1] 2
$B
[1] 5.5
$C
[1] 8.5
> data$Averages <- avg[data$Category]
> data
Category Value Averages
1 A 1 2
2 A 2 2
3 A 3 2
4 B 4 5.5
5 B 5 5.5
6 B 6 5.5
7 B 7 5.5
8 C 8 8.5
9 C 9 8.5
You can use plyr, data.table, etc. more efficiently for larger datasets.