Sunday, October 29, 2017

Install Spark on Windows 10

0) Make sure you have python3.6 and Java8
1) install scala https://www.scala-lang.org/download/
2) install Jupyter `pip3 install jupyter`
3) download spark http://spark.apache.org/downloads.html
4) unzip spark to a folder like C:\spark
5) download winutils.exe from https://github.com/steveloughran/winutils (file in version_of_hadoop/bin)
6) Add SPARK_HOME=C:\spark in your environment, and add %C:\SPARK_HOME%\bin into your path

The following step are optional
7) download hadoop http://hadoop.apache.org/releases.html
8-10) repeat similar process as 5-7
11) install pyspark `pip install pyspark`
12) install findspark `pip3 install findspark`
13) Test:
import findspark
import pyspark
import random
import datetime

findspark.init()
sc = pyspark.SparkContext(appName="Pi")
num_samples = 100000000
def inside(p):
  x, y = random.random(), random.random()
  return x*x + y*y < 1
# spark
before = datetime.datetime.now()
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
after = datetime.datetime.now()
print(pi)
print(after - before)
sc.stop()

#no spark
before = datetime.datetime.now()
count = 0
for i in range(0, num_samples):
  if (inside(i)):
    count += 1
pi = 4 * count / num_samples
after = datetime.datetime.now()
print(pi)
print(after - before)

References:
[1] http://www.ics.uci.edu/~shantas/Install_Spark_on_Windows10.pdf
[2] https://blog.sicara.com/get-started-pyspark-jupyter-guide-tutorial-ae2fe84f594f

Saturday, March 11, 2017

install TensorFlow in Windows

1) install python3.5 https://www.python.org/downloads/
2) install Matplotlib `pip install matplotlib`
3) install tensorflow `pip install tensorflow`
Optional
4) install tensorflow_gpu `pip install tensorflow_gpu`
5) install CUDA8.0 https://developer.nvidia.com/cuda-downloads
6) download cuDNN https://developer.nvidia.com/cudnn (require membership with free registration), unzip it, and copy the files into related folder in CUDA [1]


`cuda\bin\cudnn64_5.dll` to `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\`
`cuda\include\cudnn.h` to `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include\`
`cuda\lib\x64\cudnn.lib` to `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\`
7) Add CUDA path (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin) into %PATH% environment variable

References:
[1] https://github.com/tensorflow/tensorflow/issues/5968

Tuesday, January 24, 2017

Install scikit-learn in Windows

1) install python https://www.python.org/downloads/
2) install Python Tools for Visual Studio (PTVS) https://github.com/Microsoft/PTVS/releases
3) download numpy+mkl, scipy and scikit-learn http://www.lfd.uci.edu/~gohlke/pythonlibs
4) pip install numpy‑1.12.0+mkl‑cp36‑cp36m‑win32.whl #whatever you downloaded
5) pip install scipy‑0.18.1‑cp36‑cp36m‑win32.whl #whatever you downloaded
6) pip install scikit_learn‑0.18.1‑cp36‑cp36m‑win32.whl #whatever you downloaded