Database schema to map Hotel Amenities from different sources into a uniform format to remove duplicates - sql

I am creating a hotel application that fetches hotel feeds from different sources and store it to database to form a uniform structure and expose API to the mobile application. I am fetching hotels from 3 different sources using python/django app.
Now every hotel source has different sets of amenities for example:
Source 1 [Expedia]
- Free WiFi
- Hairdryer In Room
- Cable TV
- Double Bed
- Single Bed
- Fireplace
Source 2 [SomeHotelProvider]
- WiFi
- Hairdryer
- Television
- and so on
So here we have same amenity name with different name (Free WiFi and WiFi ) for example. The only problem is that at the Mobile screen it will display two filter [Free WiFi, Wifi] to filter out the result set. So what is best approach to deal with these duplicate values.
Need a solution to create a mapping table that maps all duplicates to one amenities master table.
Thanks in advance.

I would use a JSONB column so you can accept any type of data. in a hash format. Then you will have to find duplicate keys and consolidate them.
https://www.postgresql.org/docs/9.4/static/datatype-json.html
Then you will have to go through and consolidate duplicate keys (wifi, free wifi, etc). Unfortunately this can't be done programatically as even if you wrote a perfect program to do this, in the future there might be a new form you did not account for like "included Wifi".

Related

If row contains value, return value from column A value, for all rows

I've seen some similar posts, but none with a really helpful answer for my particular issue. I'm a programmatic advertising data analyst, so i'm trying to associate a many-to-many relationship.
We run "personas", which is a group of apps. A persona has many apps, and apps have many personas. I have the data organized by persona: each row is a persona, and each column in that row contains one of the apps comprising that persona. Each persona has a different number of apps. i.e.
Row Persona App 1 App 2 App 3
1 Casino Persona "Slot Kings" "Wild Casino" "Real Gambling App"
2 RPG Persona "Dragon Valor" "KOTOR"
3 Sports Persona "MLB: The App" "Real Soccer" "Hockey Fans 2016"
4 Gen-X Females "Scrapbook App" "Baby Monitor" "PostMates"
So i know which apps belong to each persona. I'm now trying to determine which personas belong to each app. I'd like to create another worksheet that switches "apps" and "personas." e.g.
Row App Persona 1 Persona 2 Persona 3
1 Slot Kings "Casino Persona" "Slot Persona"
2 KOTOR "RPG Persona" "Star Wars Fans" "SciFi Persona"
3 MLB: The App "Sports Persona" "Baseball Fans"
I can't figure out any way to do this an insane nested statement, VBA, or a similarly crazy array formula.
I understand what you're asking for. However, trying to rearrange your data with either formulas or VBA would take a lot of time and effort; and is also quite unnecessary.
The layout of your data is good for viewing, but is not suitable for any kind of data analysis. If you want to perform analysis, data should always be stored in a database format. See this article for more info:
https://support.office.com/en-us/article/Guidelines-for-organizing-and-formatting-data-on-a-worksheet-90895CAD-6C85-4E02-90D3-8798660166E3
For your data, you should just have 2 columns, one for Persona and one for App. Then on every row, you list just one possible combination of Persona and App. It doesn't matter in what order you place the data.
With your data set up correctly, you can now create PivotTables that automatically arrange your data either way. It's also possible to easily count the number of apps in each persona and vice versa and then also create charts. Look up PivotTables and PivotCharts for more info.
To create the top left table, put Persona into the Rows field and then put App into Rows field as well. (Turn off subtotals and totals to make the table neater, and change the layout to Tabular to see the field names.)
To create the top right table, put App into the Data field instead.
To create the bottom tables, add the fields in the opposite order.
Note that if you have more details about either personas or apps that you want to record, they should be stored in separate tables. This is now getting into the realm of designing a relational database. Look for references on database design for more information.

SAP Flight reservation application

I am accessing flight reservation application built in SAP.
The application has a section on catering which contains: BC_MEAL, BC_MEALT, BC_STARTER, BC_MAINCOURSE, BC_DESSERT.
However, there are no such tables prefixed with BC_.
The tables are SMEAL, SMEALT, SSTARTER, SMACOURSE, SDESSERT instead.
Why is this discripency due to? How does SAP manage to convert application names into table names.
You're looking at the Data Modeler (SD11) and trying to compare it to the Data Dictionary / ABAP Dictionary (SE11). The actual table names are assigned to the entities explicitly:
expand BC_FLIGHT
double-click on BC_SFLIGHT
Button Dict. (?)
--> This screen should show the tables and/or views used to represent the entity.
It is worth noting that for many applications, no explicit data model exists (which is why I personally never bothered with the Data Modeler - a tool like this is virtually useless unless everyone else uses it as well).

Linking two seperate sets of data codes without a common identifier

I have two large sets of data. Both sets are a form of structured coding system,and is used to categorize groups of people based on their occupation. The two sets of data have no common identifier. Besides a column that contains a unique identifier each table has a description for said identifier, but although they may be describing similar things the descriptions are not identical.
How do I create a table, that connects the two sets of data, without having to go back and manually try to figure out how to make the connection between the two identifiers. I am not sure if this can be done on Access or SQL. If there is a way to do this, I would like to know what software is maybe out there.
Here's some example data:
Table 1:
Z Identifier DescriptionA
162000 Pharmacist
3123566 Electronic Repairman
143246 Banker
8444455 Doctor
Table 2:
Q Identifier DescriptionB
XX134556 COPY/PRINT/SCAN EQUIP
666Q1224 DRUGS
722WWYZ Financial Svc
8456435T Medical Services
15666PP Health Services
Desired Output:
Table 3:
Z Identifier DescriptionA Q Identifier DescriptionB
162000 Pharmacist 666Q1224 DRUGS
3123566 Electr Repairman XX134556 COPY/PRINT/SCAN EQUIP
143246 Banker 722WWYZ Financial Svc
8444455 Doctor 8456435T Medical Services
Table 1:
Z Identifier DescriptionA
162000 Pharmacist
3123566 Electronic Repairman
143246 Banker
8444455 Doctor
Table 2:
Q Identifier DescriptionB
XX134556 COPY/PRINT/SCAN EQUIP
666Q1224 DRUGS
722WWYZ Financial Svc
8456435T Medical Services
15666PP Health Services
Output:
Z Identifier DescriptionA Q Identifier DescriptionB
162000 Pharmacist 666Q1224 DRUGS
3123566 Electr Repairman XX134556 COPY/PRINT/SCAN EQUIP
143246 Banker 722WWYZ Financial Svc
8444455 Doctor 8456435T Medical Services
Conventional tools that you are used to (like Access, Excel, and SQL) can only go so far with comparing the meaning and usage of words.
In other words (forgive the pun), in order to do this, you need some sort of natural language processing toolkit (NLPT). Along with that, you also need some knowledge of how to program, because I don't think there exists front-end interfaces that can give you the output you want given only the input you listed by just filling out some forms.
So with that in mind, in order to solve your problem (I'll assume you know how to program and can pick up a NLPT in a language of your choice), you need to do the following:
Put your two datasets in some tables.
Manipulate DescriptionA and DescriptionB to be something meaningful to the NLPT you are using. They won't like a string such as "COPY/PRINT/SCAN/ EQUIP". They'll want the slashes removed and the words separated.
Compare DescriptionA with DescriptionB in a permutation-style manner by using a path_similarity type of function in the library. For example path_similarity('animal.definition1', 'dog.definition1') should return a high value, say .60, while path_similarity('animal.definition1', 'book.definition1') should return a low value, like .10.
If the path_similarity is above a certain value (up for you to decide), join the two items together and append them as a single row to a results table, while removing them from their respective tables. Continue doing this until the list is exhausted of DescriptionA greater than a certain similarity to a DescriptionB. Then do something else with the rows that are left in Table 1 and Table 2.
This should all be fairly easy to do programmatically. You may find you are not getting proper matches in some places with this method because you are randomly choosing two words to compare. Because of that, you may want to find another algorithm other than just permutations, perhaps one that looks at the statistics of the path_similarity of every piece of your data to every other piece and acts more appropriately.
Additionally, you may want to allow more than two words to be paired up. For example; "lumberjack", "tree cutter", and "tree chopper" make more sense to be grouped in one row with an additional two columns created than to throw one of them out who will likely be left without a pair. All of the problems I just listed in this paragraph, I'm sure are not new problems and you can search around the internet in order to solve them. Best of luck!

how do you merge 2 different databases?

Id like to know what can i do to keep 2 databases in 2 different locations, with the same structures except different, data sinchronized.
If i have to be more detailed the problem is that i have 2 stores in which i have a server each. The reason is that there are network issues (can't be helped) so sometimes one of the stores has no internet for over 2 days, still we want to be able to enter information in the system (let say a billing system). So in order to solve this i will be making one store to enter invoices with odd numbers and the other store with even numbers so that when a store has no internet they can still work. Now here is what i don't know how to do: when internet is back again i want the 2 databases to sinchronize their data. How do i do that?
I'll be using MySQL
Thanks!

MySQL joins for friend feed

I'm currently logging all actions of users and want to display their actions for the people following them to see - kind of like Facebook does it for friends.
I'm logging all these actions in a table with the following structure:
id - PK
userid - id of the user whose action gets logged
actiondate - when the action happened
actiontypeid - id of the type of action (actiontypes stored in a different table - i.e. following other users, writing on people's profiles, creating new content, commenting on existing content, etc.)
objectid - id of the object they just created (i.e. comment id)
onobjectid - id of the object they did the action to (i.e. id of the content that they commented on)
Now the problem is there are several types of actions that get logged (actiontypeid).
What would be the best way of retrieving the data to display to the user?
The easiest way out would be gabbing the people the user follows dataset and then just go from there and grab all other info from the other tables (i.e. the names of the users the people you're following just started following, names of the user profiles they wrote on, etc.). This however would create a a huge amount of small queries and trips to the database in a while loop. Not a good idea.
I could use joins to retrieve everything in one massive data set, but how would I know where to grab the data from in just one query? - there's different types of actions that require me to look into several different tables to retrieve data, based on the actiontypeid...
i.e. To get User X is now following User Y I'd have to get my data (User Y's username) from the followers table, whereas User X commented on content Y would need me to look in the content table to get the content's title and URL.
Any tips are welcome, thanks!
Consider creating several views for different actiontypeids. Union them to have one full history.