Friday, June 10, 2016

rohitgopidi
>>>from sklearn import feature_extraction rohitgopidi
while installing, if you encounter the below
from ._sparsetools import csr_tocsc, csr_tobsr, csr_count_blocks, \
ImportError: DLL load failed: The specified module could not be found.
rohitgopidi rohit gopidi
then install the dependency Visual C++ Redistributable for Visual Studio 2015 from https://www.microsoft.com/en-us/download/confirmation.aspx?id=48145
rohit gopidi
rohitgopidi rohit gopidi

rohitgopidi

Different NLP's

OpenNLP
Natural Language Toolkit (NLTK)
Stanford NLP
MAchine Learning for LanguagE Toolkit (MALLET)
LingPipe
Freeling
TreeTagge
NLP's

rohit gopidi
rohit gopidi

steps to install numpy and scipy on windows environment for Python 3.x

PIP is provided by default in >Python 3.4 rohitgopidi
1. Open windows CLI (From start, go to RUN and type : CMD)
2. Go to the following URL to download numpy wheel : http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy and for scipy use http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy rohit gopidi
3. Before downloading numpy/scipy from the above URL, check which version is supported on your platform by using the following command in your CLI  :  print(pip.pep425tags.get_supported())
4. Download the appropriate wheel
5. Use the following command to install numpy:  pip install numpy-1.10.4+mkl-cp35-cp35m-win32.whl rohitgopidi
 for scipy : pip install scipy-0.17.0-cp35-none-win32.wl
6. Following message will be displayed
Installing collected packages: numpy
Successfully installed numpy-1.10.4
7. Done

Steps to install scientific computing packages Numpy and Scipy fpr Python 3.x on Windows environment

rohitgopidi
rohitgopidi rohit gopidi
rohitgopidi
rohit gopidi rohitgopidi

Using NLTK tool kit to classify text using predefined libraries

install and import below libraries

rohitgopidi
rohitgopidi
Reading the training dataset from a CSV, this can also be done from any file format or from any source
rohitgopidi
rohitgopidi
Once the train data is read, you can tokenize and stem if you prefer. This step can be skipped as tokenization can be done in the next steps while calculating TFIDF
rohitgopidi
rohitgopidi
Append the tokenized content , can be skipped if not using tokenizing in the previous step
rohitgopidi
rohitgopidi
Calculating count vectorizer to find the importance of the text in the document
rohitgopidi
rohitgopidi
Using Naive_Bayes library to train and predict
rohitgopidi
rohitgopidi
Test your training model by submitting your new sentence
rohitgopidi
rohitgopidi