Видео 113
Просмотров 770 281

Azure DataBricks Cluster Deployment | Spark Cluster | Spark Job

7:59

Spark Scala | Connection with Azure Data Lake | Read Data | Write Data | Azure Activity Directory

12:29

Apache Spark | Delta Lake | New Features | Part-2

19:19

Delta Lake | Spark 3 | Apache Spark New Features

21:44

Apache Spark Scala development project setup with Eclipse

24:30

Hadoop Tutorial | HDFS Blocks | Step by Step

13:02

Intro | Big Data Career Switch

#Bigdata #career #Hadoop #Spark #Scala
In this particular video, we have discussed to start a new initiative / video series on how to do the self preparation for the career switch in the BigData world
Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more
Click here to subscribe : ruclips.net/channel/UCZqHmLZxX0KC6PiJHETflOg
About us:
We are a technology consulting and training providers, specializes in the technology areas like : Machine Learning,AI,Spark,Big Data,Nosql, graph DB,Cassandra and Hadoop ecosystem.
Mastering Spark : ruclips.net/video/bU57q5R5eTc/видео.html
Mastering Hive : ruclips.net/vide...

Видео

Azure DataBricks Cluster Deployment | Spark Cluster | Spark Job

7:59

Azure DataBricks Cluster Deployment | Spark Cluster | Spark Job

Просмотров 3,6 тыс.3 года назад

#Azure #Databricks #Spark # Scala In this particular video, we have discussed in detail about how to get the spark job created and deployed on to the Azure Databricks cluster Code github Link : github.com/vireshku/SparkAzureDataLake Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click here to su...

Spark Scala | Connection with Azure Data Lake | Read Data | Write Data | Azure Activity Directory

12:29

Spark Scala | Connection with Azure Data Lake | Read Data | Write Data | Azure Activity Directory

Просмотров 5 тыс.4 года назад

#Azure #Spark #DataLake In this particular video, we have discussed how to establish the connectivity between the spark and the Azure data lake Code github Link : github.com/vireshku/SparkAzureDataLake.git Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click here to subscribe : ruclips.net/chann...

Apache Spark | Delta Lake | New Features | Part-2

19:19

Apache Spark | Delta Lake | New Features | Part-2

Просмотров 2,8 тыс.4 года назад

#Apache #Execution #Model #SparkUI #BigData #Spark #Partitions #Shuffle #Stage #Internals #Performance #optimisation #DeepDive #Join #Shuffle,#Azure #Cloud #Event #Streaming ,#hadoop #Hdfs #Mapreduce #Tutorial #Apache #Spark3 #DeltaLake #ACID In this particular video, we have discussed in detail about the New Features available as part of Apache Spark 3. Some of the major features are 1. ACID T...

Delta Lake | Spark 3 | Apache Spark New Features

21:44

Delta Lake | Spark 3 | Apache Spark New Features

Просмотров 6 тыс.4 года назад

#Apache #Spark3 #DeltaLake #ACID In this particular video, we have discussed in detail about the New Features available as part of Apache Spark 3. Some of the major features are 1. ACID Transactions in Spark 2. Schema Enforcement 3. DML Support - Delete , Update , Upsert 4. Time Travel Code github Link : gist.github.com/vireshku/1c1c34fc2d342077285c3368a0936205 Please join as a member in my cha...

Apache Spark Scala development project setup with Eclipse

24:30

Apache Spark Scala development project setup with Eclipse

Просмотров 12 тыс.4 года назад

#Apache #Spark #Eclipse #Beginner In this series, we are starting a new step by step tutorial to understand the developement and deployment of the spark scala application developement , Here in this particular video we have setup the spark scala local developement environment with Eclipse as the IDE Please join as a member in my channel to get additional benefits like materials in BigData , Dat...

Hadoop Tutorial | HDFS Blocks | Step by Step

13:02

Hadoop Tutorial | HDFS Blocks | Step by Step

Просмотров 1,9 тыс.4 года назад

#Apache #Hadoop #Introduction #Hadoop #HDFS Blocks In this series, we are starting a new step by step Hadoop tutorial for beginner to experts.In this particular video, we have discussed the design and architecture of the Hadoop HDFS Blocks. Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click he...

Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning

15:26

Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning

Просмотров 8 тыс.4 года назад

#Apache #Spark3 #Performance #Dynamic Partition Pruning In this particular video, we have discussed New Features available as Dynamic Partition Pruning for the query optimisation in Spark 3 Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click here to subscribe : ruclips.net/channel/UCZqHmLZxX0KC...

Spark Performance Optimization | Join | UNION vs OR

11:37

Spark Performance Optimization | Join | UNION vs OR

Просмотров 8 тыс.4 года назад

#Apache #Spark #Performance #Optimization In this particular video, we have discussed spark join performance Optimization in the scenario where 'OR' operator is used within the Joins. Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click here to subscribe : ruclips.net/channel/UCZqHmLZxX0KC6PiJHE...

Apache Spark 3 | Design | Architecture | New Features | Interview Question

12:58

Apache Spark 3 | Design | Architecture | New Features | Interview Question

Просмотров 8 тыс.4 года назад

#Apache #Spark3 #Design #Architecture In this particular video, we have discussed New Features available , Design and Architecture in Spark 3 Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click here to subscribe : ruclips.net/channel/UCZqHmLZxX0KC6PiJHETflOg About us: We are a technology consul...

Spark Performance Tuning | Memory Architecture | Interview Question

11:07

Spark Performance Tuning | Memory Architecture | Interview Question

Просмотров 9 тыс.4 года назад

#Apache #Spark #Performance #Memory In this particular video, we have discussed Spark performance optimisation for the efficient memory management Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click here to subscribe : ruclips.net/channel/UCZqHmLZxX0KC6PiJHETflOg About us: We are a technology c...

What is Hadoop | Introduction | Hadoop Tutorial | Architecture

8:43

What is Hadoop | Introduction | Hadoop Tutorial | Architecture

Просмотров 1,1 тыс.4 года назад

#Apache #Hadoop #Introduction #Hadoop1 vs #Hadoop2 In this series, we are starting a new step by step Hadoop tutorial for beginner to experts Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click here to subscribe : ruclips.net/channel/UCZqHmLZxX0KC6PiJHETflOg About us: We are a technology consul...

Spark Interview Question | Bucketing | Spark SQL

12:06

Spark Interview Question | Bucketing | Spark SQL

Просмотров 14 тыс.4 года назад

#Apache #Spark #SparkSQL #Bucketing Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click here to subscribe : ruclips.net/channel/UCZqHmLZxX0KC6PiJHETflOg About us: We are a technology consulting and training providers, specializes in the technology areas like : Machine Learning,AI,Spark,Big Data...

Spark Interview Questions | PySpark and Apache Arrow | What is Apache Arrow

8:32

Spark Interview Questions | PySpark and Apache Arrow | What is Apache Arrow

Просмотров 4,5 тыс.4 года назад

#Apache #spark #PYSpark #Apache #Arrow Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click here to subscribe : ruclips.net/channel/UCZqHmLZxX0KC6PiJHETflOg About us: We are a technology consulting and training providers, specializes in the technology areas like : Machine Learning,AI,Spark,Big D...

Spark Interview Question | fold vs reduce

10:42

Spark Interview Question | fold vs reduce

Просмотров 4,3 тыс.4 года назад

#Apache #spark #fold #reduce #Analytics Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more Click here to subscribe : ruclips.net/channel/UCZqHmLZxX0KC6PiJHETflOg About us: We are a technology consulting and training providers, specializes in the technology areas like : Machine Learning,AI,Spark,Big ...

Spark Interview Question | Clickstream Aanalytics

19:00

Spark Interview Question | Clickstream Aanalytics

Просмотров 2,8 тыс.4 года назад

Spark Interview Question | Clickstream Aanalytics

Spark Scenario Based Question | ClickStream Analytics

13:59

Spark Scenario Based Question | ClickStream Analytics

Просмотров 7 тыс.4 года назад

Spark Scenario Based Question | ClickStream Analytics

Spark Interview Question | Cost Based Optimizer

8:37

Spark Interview Question | Cost Based Optimizer

Просмотров 7 тыс.4 года назад

Spark Interview Question | Cost Based Optimizer

Hadoop Interview Question | Split Brain Problem

7:42

Hadoop Interview Question | Split Brain Problem

Просмотров 3,9 тыс.4 года назад

Hadoop Interview Question | Split Brain Problem

Spark Interview Question | Map vs MapPartition vs MapPartitionWithIndex

8:06

Spark Interview Question | Map vs MapPartition vs MapPartitionWithIndex

Просмотров 9 тыс.4 года назад

Spark Interview Question | Map vs MapPartition vs MapPartitionWithIndex

Spark Interview Questions | Spark Context Vs Spark Session

9:26

Spark Interview Questions | Spark Context Vs Spark Session

Просмотров 19 тыс.4 года назад

Spark Interview Questions | Spark Context Vs Spark Session

Spark Interview Question | Partition Pruning | Predicate Pushdown

8:17

Spark Interview Question | Partition Pruning | Predicate Pushdown

Просмотров 13 тыс.4 года назад

Spark Interview Question | Partition Pruning | Predicate Pushdown

Apache Spark Basics | How Spark Works | Interview Question

11:01

Apache Spark Basics | How Spark Works | Interview Question

Просмотров 6 тыс.4 года назад

Apache Spark Basics | How Spark Works | Interview Question

Spark Scenario Interview Question | Persistence Vs Broadcast

8:20

Spark Scenario Interview Question | Persistence Vs Broadcast

Просмотров 13 тыс.4 года назад

Spark Scenario Interview Question | Persistence Vs Broadcast

RDD Size Programmatically | Hands-on Code | Interview Question

5:12

RDD Size Programmatically | Hands-on Code | Interview Question

Просмотров 5414 года назад

RDD Size Programmatically | Hands-on Code | Interview Question

Cache vs Persist | Spark Tutorial | Deep Dive

7:20

Cache vs Persist | Spark Tutorial | Deep Dive

Просмотров 32 тыс.4 года назад

Cache vs Persist | Spark Tutorial | Deep Dive

Spark Execution Model | Spark Tutorial | Interview Questions

13:36

Spark Execution Model | Spark Tutorial | Interview Questions

Просмотров 2,5 тыс.4 года назад

Spark Execution Model | Spark Tutorial | Interview Questions

Managing Spark Partitions | Spark Tutorial | Spark Interview Question

11:20

Managing Spark Partitions | Spark Tutorial | Spark Interview Question

Просмотров 23 тыс.4 года назад

Managing Spark Partitions | Spark Tutorial | Spark Interview Question

12:33

Apache Spark Tutorial | NoSql Database

Просмотров 7054 года назад

Apache Spark Tutorial | NoSql Database

Spark Performance Tuning | Avoid GroupBy | Interview Question

8:37

Spark Performance Tuning | Avoid GroupBy | Interview Question

Просмотров 11 тыс.4 года назад

Spark Performance Tuning | Avoid GroupBy | Interview Question

@susanthomas223 25 дней назад
What about aggregateByKey function in RDD
@thelazitech Месяц назад
To whom may be concerned when to use GroupByKey over ReduceByKey: groupByKey() can be used for non-associative operations, where the order of application of the operation matters. For example, if we want to calculate the median of a set of values for each key, we cannot use reduceByKey(), since median is not an associative operation.
@ldk6853 2 месяца назад
Hindu again 🤢
@pankajchikhalwale8769 4 месяца назад
Hi, I like your Spark videos. Please create a dedicated video for top 100 most frequently used Spark Commands. - Pankaj C
@sagarrawal7740 6 месяцев назад
Video recommendatin at the end are blocking the content...
@pmdsngh 7 месяцев назад
i see, for RDD its memory and for Dataframe it is mem + disk
@dipakit45 7 месяцев назад
why are you talking like sleppy mode ??
@raviyadav-dt1tb 7 месяцев назад
Please provide aws questions and answers. Thank you 🙏
@avinash7003 7 месяцев назад
what is MSCK ?
@user-vl1ld3be3n 8 месяцев назад
What if I have multiple spark jobs in parallel in on spark session
@adityamathur2284 8 месяцев назад
For ORC format, schema evolution is not just limited to adding new columns. Backward Compatibility: Adding Columns: New columns can be added to the schema without affecting existing data files. When reading old ORC files with a new schema that includes additional columns, the new columns will be treated as optional and filled with default values. Removing Columns: Similar to Parquet, existing columns can be removed without breaking compatibility. When reading old ORC files with a new schema that excludes certain columns, those columns will be ignored. Changing Data Types: Data types of existing columns can be changed, and ORC will attempt to convert the data to the new type. However, similar to Parquet, this conversion might result in data loss if the types are not compatible. Forward Compatibility: Adding Columns: New columns can be added, and existing files can still be read without errors. The new columns will be filled with default values when data from the old files is read. Removing Columns: Files written with a schema that has fewer columns can still be read with a newer schema containing additional columns. The additional columns will be treated as optional. Changing Data Types: Forward compatibility is generally maintained for changing data types, but careful consideration is needed to avoid potential data loss or conversion issues. above points are what I found supplementing with your content. thanks for your videos and dedication in making them, it is really helpful for my preparation.
@YoSoyWerlix 10 месяцев назад
Hi! Why you say Avro is row oriented, isn't also columnar storage?
@srinubathina7191 11 месяцев назад
Thank you
@srinubathina7191 11 месяцев назад
Super content thank you
@raviyadav-dt1tb 11 месяцев назад
Good sir
@Tarasankarpaul1 11 месяцев назад
Could you please tell what is the difference between partition pruning and predicate pushdown
@ritikpatil4077 8 месяцев назад
Both same
@RohanKumar-mh3pt 11 месяцев назад
Very Nice and clear explanation before this video i was very confused regarding executor tuning part now after this video it is now crystal clear.
@mdmoniruzzaman703 Год назад
Hi, 10 nodes means including the master node? i have a configuration like this: "Instances": { "InstanceGroups": [ { "Name": "Master nodes", "Market": "SPOT", "InstanceRole": "MASTER", "InstanceType": "m5.4xlarge", "InstanceCount": 1 }, { "Name": "Worker nodes", "Market": "SPOT", "InstanceRole": "CORE", "InstanceType": "m5.4xlarge", "InstanceCount": 9 } ], "KeepJobFlowAliveWhenNoSteps": false, "TerminationProtected": false },
@venkateshgurram7707 Год назад
@TechWithViresh: no recent videos. Can you please add . your videos are very useful brother. thanks
@TechWithViresh Год назад
Thanks, for sure videos coming soon :)
@micheleadriaans6688 Год назад
Thanks! A great and concise explanation!
@jalsacentre1040 Год назад
The 2nd map will not executed as no action performed on result data set after collect.
@wafa0196 Год назад
hello, i find the content very interesting especially on when the hash join is better than the sort merge join. could you please tell me where you found the documentation on that?
@terrificmenace Год назад
Many thanks to you sir. 😊 i learnt spark from you
@vishalaaa1 Год назад
very good. please make videos as interview questions on spark as a group of videos
@vishalaaa1 Год назад
nice
@panduranga Год назад
Audio quality is not good content is good
@snehakavinkar2240 Год назад
Limit comes after order by in query execution order, how using limit will reduce the number of records to be sorted? Am I missing anything here?
@Trip-Train Год назад
Why are you converting dataframe to rdd ?? It is very bad practice in terms of performance
@ajaywade9418 Год назад
video from 11:30, we are adding random key to exiting towerid key for Example. tower id: 101 and salt key : 67 then 101+67= 168 hash value of the 168 would be a final value right. what in case of partition column is string datatype. ??
@TechWithViresh Год назад
Incase of strings, we can add surrogate keys, based on string column values and then do the salting.
@SahilSharma-it6gf Год назад
bhai ye hindi m bta dega toh tera kuch chla jaa rha h kya??
@tanushreenagar3116 Год назад
Perfect 👌 explanation
@andrewshk8441 Год назад
Very good and descriptive comparison. Thank you!
@PrajwalSuryawanshi-ds2xs Год назад
You gave the all information about Hive.. is this enough for interview?
@shivankchaturvedi5875 Год назад
How the last map operation will run on driver see till collect a job will be completed and whenever we call another action it will create new job with new Dag which will again distributed and run on executors??
@utku83 Год назад
Good explanation.. Thank you 👍
@ecpavanec Год назад
can we get ppt that you show in the videos?
@umeshkatighar3635 Год назад
What If each node has only 8cores?? How does spark allocate 5cores per jvm ?
@bhaskaraggarwal8971 Год назад
Awesome✨
@ansariasim4463 Год назад
bro if you have 6 blocks in Hadoop 3 then it consumes 15 blocks. Suppose we have a file which consists of 2 Blocks (B1 and B2). 1) With current HDFS setup, we will have total (2×3 = 6 blocks in total). For Block B1 -> B1.1, B1.2, B1.3 For Block B2 -> B2.1, B2.2, B2.3 2) With EC setup, we will have total (2×2 + 2/2 = 5 blocks in total). For Block B1 -> B1.1, B1.2 For Block B2 -> B2.1, B2.2 The 3rd Copy of each Block will be Xor’ed together and stored as a single Parity Block as (B1.1 xor B2.1) -> Bp In this setup: If B1.1 is corrupted, we can recompute B1.1 = Bp xor B2.1 If B2.1 is corrupted, we can recompute B2.1 = Bp xor B1.1 If both B1.1 and B2.1 are corrupted, then we have another copy of both the blocks (B1.2 and B2.2) If parity Block Bp is corrupted, then it is again recomputed as B1.1 xor B2.1
@ravikumark6746 Год назад
@Ankit Bansal can you please solve this using SQL please
@amazhobner 5 месяцев назад
This isn't instagram where you can tag channels lol
@tarunreddy5917 Год назад
Is there any differences with performance issues?
@atulgupta9301 Год назад
Crisp , concise and to the point explanation in great detail. Anyone can understand through this video. Extremely well done. Kudos...
@TechWithViresh Год назад
Glad it was helpful!
@maheshbhatm9998 Год назад
Thank you
@TechWithViresh Год назад
Welcome!
@chandrakamalgupta9116 Год назад
Thank you
@TechWithViresh Год назад
Welcome!
@vijaykumarp5882 Год назад
Good content
@vermad6233 Год назад
Voice and explanation not clear!
@himanshuramekar6938 Год назад
Sir will you please make a video that explains the rand() function?
@SpiritOfIndiaaa Год назад
how can we do percentile() avoiding groupBy ...can you explain it ?
@yeshwanthkumar445 Год назад
Good one
@TechWithViresh Год назад
Thank you! Cheers!

TechWithViresh

Комментарии