cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ajay-Pandey
by Esteemed Contributor III
  • 781 Views
  • 3 replies
  • 7 kudos

docs.databricks.com

Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...

  • 781 Views
  • 3 replies
  • 7 kudos
Latest Reply
Poovarasan
New Contributor II
  • 7 kudos

Above mentioned feature is not working in the DLT pipeline. if the scrip has more than 4 columns 

  • 7 kudos
2 More Replies
geetha_venkates
by New Contributor II
  • 6550 Views
  • 7 replies
  • 2 kudos

Resolved! How do we add a certificate file in Databricks for sparksubmit type of job?

How do we add a certificate file in Databricks for sparksubmit type of job? 

  • 6550 Views
  • 7 replies
  • 2 kudos
Latest Reply
nicozambelli
New Contributor II
  • 2 kudos

I have the same problem... when i worked with the hive_metastore in past, i was able tu use file system and also use API certs.Now i'm using the unity catalog and i can't upload a certificate, can somebody help me?

  • 2 kudos
6 More Replies
dispersion
by New Contributor
  • 797 Views
  • 2 replies
  • 1 kudos

Running large volume of SQL queries in Python notebooks. How to minimise overheads/maintenance.

I have around 200 SQL queries id like to run in databricks python notebooks. Id like to avoid creating an ETL process for each of the 200 SQL processes.Any suggestions on how to run the queries in a way that it loops through them so i have minimum am...

  • 797 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Chris French​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 1 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 1908 Views
  • 4 replies
  • 11 kudos

Resolved! Pyspark learning path

Can anyone suggest to take the best series of courses offered by Databricks to learn pyspark for ETL purpose either in Databricks partner learning portal or Databricks learning portal.

  • 1908 Views
  • 4 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

To learn Databricks ETL, I highy recommend videos made by Simon on that channel https://www.youtube.com/@AdvancingAnalytics

  • 11 kudos
3 More Replies
BkP
by Contributor
  • 3637 Views
  • 15 replies
  • 9 kudos

Suggestion Needed for a Orchestrator/Scheduler to schedule and execute Jobs in an automated way

Hello Friends,We have an application which extracts dat from various tables in Azure Databricks and we extract it to postgres tables (postgres installed on top of Azure VMs). After extraction we apply transformation on those datasets in postgres tabl...

image
  • 3637 Views
  • 15 replies
  • 9 kudos
Latest Reply
VaibB
Contributor
  • 9 kudos

You can leverage Airflow, which provides a connector for databricks jobs API, or can use databricks workflow to orchestrate your jobs where you can define several tasks and set dependencies accordingly.

  • 9 kudos
14 More Replies
User16835756816
by Valued Contributor
  • 2447 Views
  • 4 replies
  • 11 kudos

How can I extract data from different sources and transform it into a fresh, reliable data pipeline?

Tip: These steps are built out for AWS accounts and workspaces that are using Delta Lake. If you would like to learn more watch this video and reach out to your Databricks sales representative for more information.Step 1: Create your own notebook or ...

  • 2447 Views
  • 4 replies
  • 11 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 11 kudos

Thanks @Nithya Thangaraj​ 

  • 11 kudos
3 More Replies
KVNARK
by Honored Contributor II
  • 2815 Views
  • 8 replies
  • 28 kudos

Resolved! Can we use Databricks or code in data bricks without learning Pyspark in depth which is used for ETL purpose and data engineering perspective.

Can we use Databricks or code in data bricks without learning Pyspark in depth which is used for ETL purpose and data engineering perspective. can someone throw some light on this. Currently learning Pyspark (basics of Pythion in handling the data) a...

  • 2815 Views
  • 8 replies
  • 28 kudos
Latest Reply
KVNARK
Honored Contributor II
  • 28 kudos

Thanks All for your valuable suggestions!

  • 28 kudos
7 More Replies
CarterM
by New Contributor III
  • 2444 Views
  • 2 replies
  • 2 kudos

Resolved! Why Spark Streaming from S3 is returning thousands of files when there are only 9?

I am attempting to stream JSON endpoint responses from an s3 bucket into a spark DLT. I have been very successful in this practice previously, but the difference this time is that I am storing the responses from multiple endpoints in the same s3 buck...

8_9 endpoint response structure Soccer  endpoint  9 9 endpoint responses in same s3 bucket
  • 2444 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Carter Mooring​ Thank you SO MUCH for coming back to provide a solution to your thread! Happy you were able to figure this out so quickly. And I am sure that this will help someone in the future with the same issue.

  • 2 kudos
1 More Replies
vjraitila
by New Contributor III
  • 1122 Views
  • 3 replies
  • 5 kudos

Strategy for streaming ETL and Delta Lake before Delta Live Tables existed

What was the established architectural pattern for doing streaming ETL with Delta Lake before DLT was a thing? And incidentally, what approach would you take in the context of delta-oss today? The pipeline definitions would not have had to be declara...

  • 1122 Views
  • 3 replies
  • 5 kudos
Latest Reply
Vidula
Honored Contributor
  • 5 kudos

Hi @Veli-Jussi Raitila​ Does @Shanmugavel Chandrakasu​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 5 kudos
2 More Replies
Zair
by New Contributor II
  • 1008 Views
  • 2 replies
  • 2 kudos

How to handle 100+ tables ETL through spark structured streaming?

I am writing a streaming job which will be performing ETL for more than 130 tables. I would like to know is there any other better way to do this. Another solution I am thinking is to write separate streaming job for all tables. source data is coming...

  • 1008 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Valued Contributor III
  • 2 kudos

Hi, I guess to answer your question it might be helpful to get more details on what you're trying to achieve and the bottleneck that you encounter now.Indeed handle the processing of 130 tables in one monolith could be challenging as the business rul...

  • 2 kudos
1 More Replies
sage5616
by Valued Contributor
  • 10451 Views
  • 3 replies
  • 2 kudos

Resolved! Choosing the optimal cluster size/specs.

Hello everyone,I am trying to determine the appropriate cluster specifications/sizing for my workload:Run a PySpark task to transform a batch of input avro files to parquet files and create or re-create persistent views on these parquet files. This t...

  • 10451 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

If the data is 100MB, then I'd try a single node cluster, which will be the smallest and least expensive. You'll have more than enough memory to store it all. You can automate this and use a jobs cluster.

  • 2 kudos
2 More Replies
Anonymous
by Not applicable
  • 634 Views
  • 1 replies
  • 3 kudos

March Madness + Data  Here at Databricks we like to use (you guessed it) data in our daily lives. Today kicks off a series called Databrags �� ...

March Madness + Data Here at Databricks we like to use (you guessed it) data in our daily lives. Today kicks off a series called Databrags Databrags are glimpses into how Bricksters and community folks like you use data to solve everyday problems, e...

  • 634 Views
  • 1 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

@Lindsay Olson​, Awesome!

  • 3 kudos
Kash
by Contributor III
  • 5850 Views
  • 19 replies
  • 13 kudos

Resolved! HELP! Converting GZ JSON to Delta causes massive CPU spikes and ETL's take days!

Hi there,I was wondering if I could get your advise.We would like to create a bronze delta table using GZ JSON data stored in S3 but each time we attempt to read and write it our clusters CPU spikes to 100%. We are not doing any transformations but s...

  • 5850 Views
  • 19 replies
  • 13 kudos
Latest Reply
Kash
Contributor III
  • 13 kudos

Hi Kaniz,Thanks for the note and thank you everyone for the suggestions and help. @Joseph Kambourakis​ I aded your suggestion to our load but I did not see any change in how our data loads or the time it takes to load data. I've done some additional ...

  • 13 kudos
18 More Replies
Labels