mikePietsch.com
Home
1
2
3
4
5
6
Tags
1
2
3
4
5
6
Spark
Spark
1. What Spark Solves and the Distributed Computing Model
10. Column Operations: select, withColumn, drop, rename, and cast
11. Filtering, Sorting, and Sampling DataFrames
12. Joins: Inner, Left, Right, Cross, and Broadcast Hint
13. Aggregations: groupBy, agg, pivot, rollup, and cube
14. Window Functions: over, partitionBy, orderBy, and Frame Boundaries
15. User-Defined Functions: UDFs, Pandas UDFs, and Performance
16. Spark SQL: sql(), Temporary Views, and the Catalog
17. Reading and Writing Data: DataFrameReader and DataFrameWriter
18. Parquet Deep Dive: Columnar Storage, Predicate Pushdown, and Partitioning
19. Delta Lake: ACID Transactions, Time Travel, and MERGE
2. Spark Architecture: Driver, Executors, and Cluster Managers
20. Structured Streaming: Sources, Output Modes, and Checkpointing
21. MLlib: Pipelines, Transformers, Estimators, and Feature Engineering
22. Spark on YARN and Kubernetes
23. Databricks: Clusters, Notebooks, Unity Catalog, and Photon
24. Performance Tuning: Partitioning, Caching, Skew, and AQE
25. The Spark UI: Stages, Tasks, DAG Visualization, and Execution Plans
26. Debugging Common Spark Errors and Failures
27. PySpark-Specific Patterns and the Pandas API on Spark
3. Installation: Local Mode, spark-shell, and PySpark
4. RDDs: Creating, Transformations, Actions, and Lineage
5. RDD Transformations: map, flatMap, filter, join, and groupByKey
6. RDD Actions: collect, reduce, count, take, and saveAsTextFile
7. DataFrames and the Spark SQL Engine
8. Creating DataFrames from Files, RDDs, and Databases
9. Schema Inference vs Schema Definition: StructType and StructField
— joke —
...
✕
Tech
Faith
Random
Blah, blah
This site uses cookies for analytics.
Privacy Policy
Accept
Decline