Text Analysis Project using PySpark ML
$30-250 USD
Pagado a la entrega
I want someone to do a theme analyses around 5 million comments on a video sharing website using PySpark Ml library as the main tool. I will provide the dataset. The work environment should be Databricks Community Edition (you can create an account for free), and the deliverable is a Databricks notebook.
The data is at “video_creator – commentor_id – comment” granularity. What I want you to do is the following:
1. Remove comments that are not written in English.
2. For each commentor_id, append all his/her comments into one feature, call it “all_comments”. That is, aggregate the granularity of dataset into commentor_id – all_comments granularity
3. Transform the “all_comments” feature using Word2Vec modules of PySpark ML library (not the MlLib library as I want to do everything using dataframes)
4. Do a clustering of the transformed “all_comments” feature using the LDA module of PySpark ML.
5. Generate the most frequent words for each cluster as identified in field. I will do the interpretation of the results, and you don’t need to worry about it.
So overall, it’s a straightforward task of data clean, aggregation, and application of standard PySpark ML modules.
I estimate this project to take 2 to 3 hours of programming for someone good at Python and PySpark. I hope to get the project done in 3 days, up to 6 days is acceptable. If you place your bid, I will share with you the link to the data file. I don't have other instructions other than those five steps listed above.
Nº del proyecto: #17903811
Sobre el proyecto
7 freelancers están ofertando un promedio de $271 por este trabajo
I have a good hands on working with Advanced R and Python and BI tools and technologies, AI, Big Data. I have quite a good knowledge of DL/ML Algorithm , have also developed Dashboards and Web Application. My area of e Más
Hello! I am a python developer. I looked at your project and it seems interesting. I have all necessary skills required for this project. Ping me to discuss in detail.
Hi I am a very experienced statistician, data scientist and academic writer. I have completed several PhD level thesis projects involving advanced statistical analysis of data. I have worked with data from several comp Más
Hello? I have read your job description carefully. I have python experienced for 7 years. I want to discuss with you via chat. Thanks you, James.