赞
踩
最近在研究pyspark,首先把环境配置下,记录如下:
作为入门材料,首先看了 eat_pyspark_in_10_days 里面第一章列举了安装方法,然后照着配置了下,发现自己本地还是跑不起来,故开始一番折腾。
然后这里测试下是否配置成功 cmd:java -version 如果出现版本信息就很好!如果没出现在细细配置环境变量
- import pyspark
- from pyspark import SparkContext, SparkConf
- import os
- import findspark
- import warnings
-
- warnings.filterwarnings("ignore")
-
- os.environ['JAVA_HOME'] = r'C:\Java\jdk1.8.0_301'
- os.environ['HADOOP_HOME'] = r'G:\hadoop-3.2.2'
- # 指定spark_home为刚才的解压路径,指定python路径
- spark_home = r"G:\spark-3.1.2-bin-hadoop3.2"
- python_path = r"E:\Anaconda\Anaconda\envs\spark\python"
- findspark.init(spark_home, python_path)
- conf = SparkConf().setAppName("test").setMaster("local[4]")
- sc = SparkContext(conf=conf)
-
- print("spark version:", pyspark.__version__)
- rdd = sc.parallelize(["hello", "spark"])
- print(rdd.collect())

- E:\Anaconda\Anaconda\envs\spark\python.exe F:/python_project/untitled/src/scripts/test/py_spark.py
- 21/09/27 20:42:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
- Setting default log level to "WARN".
- To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
- spark version: 3.1.2
- 21/09/27 20:42:57 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
- ['hello', 'spark']
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。