cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

yang
by New Contributor II
  • 1047 Views
  • 1 replies
  • 2 kudos

Resolved! Error in DE 4.1 - DLT UI Walkthrough (from Data Engineering with Databricks v3 course)

I am working on Data Engineering with Databricks v3 course. In notebook DE 4.1 - DLT UI Walkthrough, I countered an error in cmd 11: DA.validate_pipeline_config(pipeline_language)The error message is: AssertionError: Expected the parameter "suite" to...

  • 1047 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

The DA validate function is just to check that you named the pipeline correctly, set up the correct number of workers, 0, and other configurations. The name and directory aren't crucial to the learning process. The goal is to get familiar with the ...

  • 2 kudos
eimis_pacheco
by Contributor
  • 939 Views
  • 1 replies
  • 1 kudos

How to remove more than 4 byte characters using pyspark in databricks?

Hi community,We have the need of removing more than 4 byte characters using pyspark in databricks since these are not supported by amazon Redshift. Does someone know how can I accomplish this?Thank you very much in advanceRegards

  • 939 Views
  • 1 replies
  • 1 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 1 kudos

assuming you are having a string type column in pyspark dataframe, one possible way could beidentify total number of characters for each value in column (say identify no of bytes taken by each character (say b)use substring() function to select first...

  • 1 kudos
Ullsokk
by New Contributor III
  • 2557 Views
  • 1 replies
  • 5 kudos

How do I import a notebook from workspaces to repos?

I have a few notebooks in workspaces that I created before linking repo to my git. I have tried importing them from the repo (databricks repo). The only two options are a local file from my pc or a url. The url for a notebook does not work. Do I need...

  • 2557 Views
  • 1 replies
  • 5 kudos
Latest Reply
Geeta1
Valued Contributor
  • 5 kudos

Hi @Stian Arntsen​ , when you click on the down arrow beside your notebook name (in your workspace), you will have a option called 'clone'. You can use it to clone your notebook from your workspace to repos. Hope it helps!

  • 5 kudos
hare
by New Contributor III
  • 8370 Views
  • 1 replies
  • 1 kudos

Failed to merge incompatible data types

We are processing the josn file from the storage location on every day and it will get archived once the records are appended into the respective tables.source_location_path: "..../mon=05/day=01/fld1" , "..../mon=05/day=01/fld2" ..... "..../mon=05/d...

  • 8370 Views
  • 1 replies
  • 1 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 1 kudos

@Hare Krishnan​ the issues highlighted can easily be handled using the .option("mergeSchema", "true") at the time of reading all the files.Sample code:spark.read.option("mergeSchema", "true").json(<file paths>, multiLine=True)The only scenario this w...

  • 1 kudos
Magnus
by Contributor
  • 2173 Views
  • 3 replies
  • 10 kudos

Resolved! How to retrieve Auto Loader client secret from Azure Key Vault?

I'm using Auto Loader in a SQL notebook and I would like to configure file notification mode, but I don't know how to retrieve the client secret of the service principal from Azure Key Vault. Is there any example notebook somewhere? The notebook is p...

  • 2173 Views
  • 3 replies
  • 10 kudos
Latest Reply
Geeta1
Valued Contributor
  • 10 kudos

Hi @Magnus Johannesson​ , you must use the Secrets utility (dbutils.secrets) in a notebook or job to read a secret.https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-utils#dbutils-secretsHope it helps!

  • 10 kudos
2 More Replies
Priyanka48
by New Contributor III
  • 8362 Views
  • 5 replies
  • 10 kudos

The functionality of table property delta.logRetentionDuration

We have one project requirement where we have to store only the 14 days history for delta tables. So for testing, I have set the delta.logRetentionDuration = 2 days using the below commandspark.sql("alter table delta.`[delta_file_path]` set TBLPROPER...

  • 8362 Views
  • 5 replies
  • 10 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 10 kudos

Hi @Priyanka Mane​, We haven’t heard from you since the last response from @Werner Stinckens​ and @Uma Maheswara Rao Desula​, and I was checking back to see if their suggestions helped you.Or else, If you have any solution, please share it with the c...

  • 10 kudos
4 More Replies
andrew0117
by Contributor
  • 1988 Views
  • 5 replies
  • 9 kudos

Resolved! How to call a few child notebooks from master notebook parallelly?

Planning using dbutils.notebook.run() to call all the child notebooks in the master notebook, but they are executed sequentially. 

  • 1988 Views
  • 5 replies
  • 9 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 9 kudos

Hi @andrew li​, We haven’t heard from you since the last response from @Uma Maheswara Rao Desula​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpf...

  • 9 kudos
4 More Replies
lawrence009
by Contributor
  • 1496 Views
  • 3 replies
  • 2 kudos

FutureWarning: ``databricks.feature_store.entities.feature_table.FeatureTable.keys`` is deprecated since v0.3.6

I'm getting this message with the following code:from databricks import feature_store   fs = feature_store.FeatureStoreClient()   fs.create_table( name='feature_store.user_login', primary_keys=['user_id'], df=df_x, description='user l...

  • 1496 Views
  • 3 replies
  • 2 kudos
Latest Reply
DavideAnghileri
Contributor
  • 2 kudos

Yes, it's a nice thing to do. You can report it here: https://community.databricks.com/s/topic/0TO3f000000CnKrGAK/bug-report and if it's more urgent or blocking for you, you can also open a ticket to the help center: https://docs.databricks.com/resou...

  • 2 kudos
2 More Replies
dragonH
by New Contributor
  • 1001 Views
  • 0 replies
  • 0 kudos

The CDC Logs from AWS DMS not apply correctly

I have a dms task that processing the full-load and replication ongoing tasksfrom source (MSSQL) to target (AWS S3)then use delta lake to handle the CDC logsI've a notebook that would insert data into mssql continuously (with id as primary key)then d...

204293406-01bf6cc1-bb6f-42bb-9bfe-e9b1f5135ae9[1]
  • 1001 Views
  • 0 replies
  • 0 kudos
apayne
by New Contributor III
  • 2112 Views
  • 1 replies
  • 4 kudos

Databricks Jobs API not returning notebook run results?

Calling a databricks notebook using the Rest API, can confirm that it is executing the notebook, but is not accepting my parameters or returning a notebook output. Any ideas on what I am doing wrong here?My code and notebook function are below, tryin...

view view2
  • 2112 Views
  • 1 replies
  • 4 kudos
Latest Reply
apayne
New Contributor III
  • 4 kudos

Resolved this by using dbutils within the notebook being called from the API.# databricks notebook function   data = dbutils.widgets.get('data') # pulls base_parameters from API call   def add_test(i): result = i + ' COMPLETE' return result  ...

  • 4 kudos
Swapnil1998
by New Contributor III
  • 677 Views
  • 0 replies
  • 0 kudos

How to query a MySQL Table from Databricks?

I wanted to query a MySQL Table using Databricks rather than reading the complete data using a dbtable option, which will help in incremental loads.remote_table = (spark.read .format("jdbc") .option("driver", driver) .option("url", URL) .option("quer...

  • 677 Views
  • 0 replies
  • 0 kudos
Harish14
by New Contributor III
  • 1214 Views
  • 4 replies
  • 1 kudos

Hi @Kaniz Fatma​ and @Nadia Elsayed​ , i have taken databricks data engineer associate exam on nov 27th . in result mail it is mentioned i have obtain...

Hi @Kaniz Fatma​ and @Nadia Elsayed​ ,i have taken databricks data engineer associate exam on nov 27th . in result mail it is mentioned i have obtained below 70% in assessment but as per the section wise results i have gained more than 70% . Can you ...

  • 1214 Views
  • 4 replies
  • 1 kudos
Latest Reply
Nadia1
Honored Contributor
  • 1 kudos

Hello Harish - I have responded via email. Thank you

  • 1 kudos
3 More Replies
Bartek
by Contributor
  • 3833 Views
  • 3 replies
  • 7 kudos

Resolved! Number of partitions in Spark UI Simulator experiment

I am learning how to optimize Spark applications with experiments from Spark UI Simulator. There is experiment #1​596 about data skew and in command 2 there is comment about how many partitions will be set as default:// Factor of 8 cores and greater ...

obraz
  • 3833 Views
  • 3 replies
  • 7 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 7 kudos

Hi @Bartosz Maciejewski​ Generally we arrive at the number of shuffle partitions using the following method.Input Size Data - 100 GBIdeal partition target size - 128 MBCores - 8Ideal number of partitions = (100*1028)/128 = 803.25 ~ 804To utiltize the...

  • 7 kudos
2 More Replies
640913
by New Contributor III
  • 3839 Views
  • 2 replies
  • 1 kudos

%pip install requirements.txt - path not found

Hi everyone,I was just testing things out to come up with a reasonable way of working with version management in DB and was inspired by the commands specified here. For my team and I, it makes no sense to put the requirements file in the dbfs locatio...

  • 3839 Views
  • 2 replies
  • 1 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 1 kudos

Created a requirements.txt and pulled it into your repo folder ? Didn't get exactly this part..Maybe a screenshot should do for my understanding.If you are not storing your TEXT file in any storage space, you can't do the above stuff you are trying t...

  • 1 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels