best way to merge rows that have matching ids - sql

i have a table of households which has the address information and city info
and then i have and individuals table of all the people in the household
it could be 1 person that belongs to the house hold or it could be 10
what i want to achieve is that if the individuals belong to the same household there information will show up in the same row as the household information all in 1 row
so if theres 10 people the inforamtion will still be in 1 row, if theres 2 people still only 1 row
household table
1 bekshire st dell MA 10001 02639 50 0002 dell NULL ALRGEN
BERKSHIRE ST NULL NULL NULL NULL
individuals that belong to household id 10001
first last code
BOB BUILDER U
JESS BUILDER A
i want
1 bekshire st dell MA 10001 02639 50 0002 dell NULL ALRGEN 1 BERKSHIRE ST BOB,JESS BUILDER U,A

The reason this is so hard is that SQL favors normalization and structure, and essentially what your asking for is to go the opposite direction. I know I'm not directly answering your question, but maybe your best bet is to consider manipulating and displaying the data on the client side and stick to simple queries to get the data from the database.

Related

Dataset interpretation Continuous vs Categorical for House Prices

I'm working with the UK house price dataset and was wanting to create a ML model to predict the price of a house based on the city (plus some other categories).
As a newb to all of this, I am stumped. I am fine creating models with continuous variables, or even carrying out one-hot encoding (dummy variables) for some of the other categories which have 4 different options (type of house for example).
However, when it comes to cities, there are about 1200 different cities in the data set and so I am not sure how to engineer the data to deal with this.
Would greatly appreciate anyone having any idea about this!
No matter how much I search, I can't find an answer to this, but this could perhaps be due to not knowing exactly what to search for.
For me you need to have a city grade in every city and a price for a house.
For example:
Country | City Grade
------------+------------
Los Angeles | 1
New York | 4
House | Price
------------+------------
Option1 | $200,000
Option2 | $300,000
Then calculate the house price based on the city grade by multiplying house price * City Grade.
So it means the Option1 house in Los Angeles will still $200,000 but in New York would be $1,200,000.
You don't need to worry about the 1200 cities its easy to query in database.

SQL Choose rows based on infant ID and whether the infant's mother ID is in the row or not

I'm very new to SQL and this is only my second post on stackoverflow. I'm trying to follow the rules but please excuse my n00bness. Thanks in advance for your time and help. I'm using MS Access.
I'm studying infant social development and mother-infant interactions. To make this easier to understand, I simplified all of the following:
I have 2 tables: biography and interactions. Biography consists of the infant identity code, date of birth of the infant, and the infant's mother. Interactions consists of data collected while observing the infants and their mothers (as well as peers). I observe an infant for a set amount of time and record their behavior at each timestamp. If the behavior involves a partner (I'm specifically interested in play behavior) I include the identity of the partner.
What I would like to do is take out all the "play" rows in which the mother is the only play partner (because I'm interested in when the infant plays with peers rather than the mother). I want to include rows in which the infant is playing with the mother AND a peer (because this counts as playing with a peer). I think this entails relating the two tables using the mom column of each infant's id. I think, in English, this could be described as: Exclude play rows where mom is the only play partner. It's important to note that who the mother is obviously depends on who the infant being observed is.
As you can see below, sometimes there are multiple play partners. Again, I do want to include rows such as the last few, where cc is playing with it's mother AND aa. The partner id's are usually separated by a space, but sometimes there are typos and there is no space or more than one space. There may even be some commas. But the ID codes are consistent and will always be there typed correctly. The dataset includes tens of thousands of lines so I'm wondering if there is an efficient way to complete this task. The tables are visualized below:
biography
id | dob | mom
-------------------------
aa 2015-01-01 mom_a
bb 2016-01-01 mom_b
cc 2017-01-01 mom_c
interactions
id | behavior | partner | time
---------------------------------------------
aa play mom_a 12:00
aa rest 12:05
aa play bb 12:10
aa play bb 12:15
aa rest 12:20
bb rest 13:00
bb rest 13:05
bb play mom_b 13:10
bb play cc 13:15
bb rest 13:20
cc rest 14:00
cc play aa bb 14:05
cc play mom_c aa bb 14:10
cc play mom_c aa 14:15
cc play mom_c aa 14:20
cc play mom_c aa 14:25
I think, below query may work, I haven't tried it yet.
select * from biography bio
inner join interactions itr
on bio.id = itr.id
where itr.partner not like bio.mom
Please note, if your interactions table is going to have large amount of data then having a "not like" clause will degrade performance. Also, you may wish to normalize the interaction table so that you do not need to keep all the partners in same row.
As mentioned in the comments, you should normalize your data structure. But the following should work with what you have:
SELECT *
FROM interactions i
WHERE behavior = 'play'
AND NOT EXISTS
(SELECT 1
FROM biography b
WHERE i.partner = b.mom
AND i.id = b.id)
So take the rows in interactions where:
Behavior = play
Partner does not have an exact match in biography.mom for the same id

Eliminate duplicate records/rows?

I'm trying to list result from a multi-table query with on row, 2 columns. I have the correct data that I need, I merely need to trim it down to 1 line of results. In other words, eliminate duplicate entries in the result. I'm using a value not shown here, school_id. Should I go with that as a distinct value? Can I do that without displaying the school_id?
SQL> select DISTINCT(school_name),Team_Name
2 from school, team
3 where team.team_name like '%B%'
4 AND school.school_id = team.school_id;
SCHOOL_NAME TEAM_NAME
-------------------------------------------------- ----------
Lawrence Central High School Bears
Lawrence Central High School BEars
Lawrence Central High School BEARS
The problem, as I'm sure you know, is the fact that "Bears" is in 3 different cases here. The simple fix is to do the upper or lower of "Team_Name" so it will only have 1 return record.
UPPER(Team_Name)

SELECT datafields with multiple groups and sums

I cant seem to group by multiple data fields and sum a particular grouped column.
I want to group Person to customer and then group customer to price and then sum price. The person with the highest combined sum(price) should be listed in ascending order.
Example:
table customer
-----------
customer | common_id
green 2
blue 2
orange 1
table invoice
----------
person | price | common_id
bob 2330 1
greg 360 2
greg 170 2
SELECT DISTINCT
min(person) As person,min(customer) AS customer, sum(price) as price
FROM invoice a LEFT JOIN customer b ON a.common_id = b.common_id
GROUP BY customer,price
ORDER BY person
The results I desire are:
**BOB:**
Orange, $2230
**GREG:**
green, $360
blue,$170
The colors are the customer, that GREG and Bob handle. Each color has a price.
There are two issues that I can see. One is a bit picky, and one is quite fundamental.
Presentation of data in SQL
SQL returns tabular data sets. It's not able to return sub-sets with headings, looking something a Pivot Table.
The means that this is not possible...
**BOB:**
Orange, $2230
**GREG:**
green, $360
blue, $170
But that this is possible...
Bob, Orange, $2230
Greg, Green, $360
Greg, Blue, $170
Relating data
I can visually see how you relate the data together...
table customer table invoice
-------------- -------------
customer | common_id person | price |common_id
green 2 greg 360 2
blue 2 greg 170 2
orange 1 bob 2330 1
But SQL doesn't have any implied ordering. Things can only be related if an expression can state that they are related. For example, the following is equally possible...
table customer table invoice
-------------- -------------
customer | common_id person | price |common_id
green 2 greg 170 2 \ These two have
blue 2 greg 360 2 / been swapped
orange 1 bob 2330 1
This means that you need rules (and likely additional fields) that explicitly state which customer record matches which invoice record, especially when there are multiples in both with the same common_id.
An example of a rule could be, the lowest price always matches with the first customer alphabetically. But then, what happens if you have three records in customer for common_id = 2, but only two records in invoice for common_id = 2? Or do the number of records always match, and do you enforce that?
Most likely you need an extra piece (or pieces) of information to know which records relate to each other.
you should group by using all your selected fields except sum then maybe the function group_concat (mysql) can help you in concatenating resulting rows of the group clause
Im not sure how you could possibly do this. Greg has 2 colors, AND 2 prices, how do you determine which goes with which?
Greg Blue 170 or Greg Blue 360 ???? or attaching the Green to either price?
I think the colors need to have unique identofiers, seperate from the person unique identofiers.
Just a thought.

Crystal reports - missing fields

using Crystal reports 10 linked to an excel document. Would like to pull the dinner field but also pull country and Company name from row that dont have it, this are linked via Bookingref. Example below. I've tried sub-reports and supressing unwanted fields but can't get it right. Also I can't make changes in excel doc as it's 1000+ records, which is exported from an online system weekly.
Id BookingRef Country CompanyName Surname Forname Dinner
1 001 UK Company1 John Andrews
2 001 Mary Jane 1
3 001 Tom Andrews 1
4 002 Germany Company2 Lee Jones
5 003 Germany Company3 Peter Lee 1
6 003 Sofie Lee 1
OK I am not sure I understand the full extent of your problem but let's start with the Country and Company name and see if I can get you moving forward. Instead of putting the Country field directly on the report you could use a formula field and do something like this:
IF {#BookingRef} = "001" Then
"UK"
Else IF {#BookingRef} = "002" Then
"Germany"
Else
"Unnamed"
Now you just put the formula field where the country field used to be and it will put the right country in bases on the BookingRef code. This, however, is only practical if you are working with a small number of Country / Company Names or possibly a big list that never changes although I would caution against the latter.
The other thing you could do is create a table in any database that holds the BookingRef, Company and Country values, link the BookingRef fields from both "databases" and then just drop the fields on your report.
If I am missing the point of your question please be real specific about what it is you are trying to accomplish and what is and is not working in your current solution.