cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

db-avengers2rul
by Contributor II
  • 1075 Views
  • 1 replies
  • 0 kudos

Resolved! zip file not able to import in workspace

Dear Team,Using the community edition when i tried to import a zip file it is always throwing some error

  • 1075 Views
  • 1 replies
  • 0 kudos
Latest Reply
db-avengers2rul
Contributor II
  • 0 kudos

Please refer to the error in the attachment my question is this restriction is only for community edition ? or also for premium account ?

  • 0 kudos
yang
by New Contributor II
  • 798 Views
  • 1 replies
  • 2 kudos

Resolved! Error in DE 4.1 - DLT UI Walkthrough (from Data Engineering with Databricks v3 course)

I am working on Data Engineering with Databricks v3 course. In notebook DE 4.1 - DLT UI Walkthrough, I countered an error in cmd 11: DA.validate_pipeline_config(pipeline_language)The error message is: AssertionError: Expected the parameter "suite" to...

  • 798 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

The DA validate function is just to check that you named the pipeline correctly, set up the correct number of workers, 0, and other configurations. The name and directory aren't crucial to the learning process. The goal is to get familiar with the ...

  • 2 kudos
eimis_pacheco
by Contributor
  • 740 Views
  • 1 replies
  • 1 kudos

How to remove more than 4 byte characters using pyspark in databricks?

Hi community,We have the need of removing more than 4 byte characters using pyspark in databricks since these are not supported by amazon Redshift. Does someone know how can I accomplish this?Thank you very much in advanceRegards

  • 740 Views
  • 1 replies
  • 1 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 1 kudos

assuming you are having a string type column in pyspark dataframe, one possible way could beidentify total number of characters for each value in column (say identify no of bytes taken by each character (say b)use substring() function to select first...

  • 1 kudos
Ullsokk
by New Contributor III
  • 2062 Views
  • 1 replies
  • 5 kudos

How do I import a notebook from workspaces to repos?

I have a few notebooks in workspaces that I created before linking repo to my git. I have tried importing them from the repo (databricks repo). The only two options are a local file from my pc or a url. The url for a notebook does not work. Do I need...

  • 2062 Views
  • 1 replies
  • 5 kudos
Latest Reply
Geeta1
Valued Contributor
  • 5 kudos

Hi @Stian Arntsen​ , when you click on the down arrow beside your notebook name (in your workspace), you will have a option called 'clone'. You can use it to clone your notebook from your workspace to repos. Hope it helps!

  • 5 kudos
hare
by New Contributor III
  • 7351 Views
  • 1 replies
  • 1 kudos

Failed to merge incompatible data types

We are processing the josn file from the storage location on every day and it will get archived once the records are appended into the respective tables.source_location_path: "..../mon=05/day=01/fld1" , "..../mon=05/day=01/fld2" ..... "..../mon=05/d...

  • 7351 Views
  • 1 replies
  • 1 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 1 kudos

@Hare Krishnan​ the issues highlighted can easily be handled using the .option("mergeSchema", "true") at the time of reading all the files.Sample code:spark.read.option("mergeSchema", "true").json(<file paths>, multiLine=True)The only scenario this w...

  • 1 kudos
Magnus
by Contributor
  • 1561 Views
  • 3 replies
  • 10 kudos

Resolved! How to retrieve Auto Loader client secret from Azure Key Vault?

I'm using Auto Loader in a SQL notebook and I would like to configure file notification mode, but I don't know how to retrieve the client secret of the service principal from Azure Key Vault. Is there any example notebook somewhere? The notebook is p...

  • 1561 Views
  • 3 replies
  • 10 kudos
Latest Reply
Geeta1
Valued Contributor
  • 10 kudos

Hi @Magnus Johannesson​ , you must use the Secrets utility (dbutils.secrets) in a notebook or job to read a secret.https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-utils#dbutils-secretsHope it helps!

  • 10 kudos
2 More Replies
Priyanka48
by New Contributor III
  • 6427 Views
  • 5 replies
  • 10 kudos

The functionality of table property delta.logRetentionDuration

We have one project requirement where we have to store only the 14 days history for delta tables. So for testing, I have set the delta.logRetentionDuration = 2 days using the below commandspark.sql("alter table delta.`[delta_file_path]` set TBLPROPER...

  • 6427 Views
  • 5 replies
  • 10 kudos
Latest Reply
Kaniz
Community Manager
  • 10 kudos

Hi @Priyanka Mane​, We haven’t heard from you since the last response from @Werner Stinckens​ and @Uma Maheswara Rao Desula​, and I was checking back to see if their suggestions helped you.Or else, If you have any solution, please share it with the c...

  • 10 kudos
4 More Replies
andrew0117
by Contributor
  • 1403 Views
  • 5 replies
  • 9 kudos

Resolved! How to call a few child notebooks from master notebook parallelly?

Planning using dbutils.notebook.run() to call all the child notebooks in the master notebook, but they are executed sequentially. 

  • 1403 Views
  • 5 replies
  • 9 kudos
Latest Reply
Kaniz
Community Manager
  • 9 kudos

Hi @andrew li​, We haven’t heard from you since the last response from @Uma Maheswara Rao Desula​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpf...

  • 9 kudos
4 More Replies
lawrence009
by Contributor
  • 1091 Views
  • 3 replies
  • 2 kudos

FutureWarning: ``databricks.feature_store.entities.feature_table.FeatureTable.keys`` is deprecated since v0.3.6

I'm getting this message with the following code:from databricks import feature_store   fs = feature_store.FeatureStoreClient()   fs.create_table( name='feature_store.user_login', primary_keys=['user_id'], df=df_x, description='user l...

  • 1091 Views
  • 3 replies
  • 2 kudos
Latest Reply
DavideAnghileri
Contributor
  • 2 kudos

Yes, it's a nice thing to do. You can report it here: https://community.databricks.com/s/topic/0TO3f000000CnKrGAK/bug-report and if it's more urgent or blocking for you, you can also open a ticket to the help center: https://docs.databricks.com/resou...

  • 2 kudos
2 More Replies
dragonH
by New Contributor
  • 706 Views
  • 0 replies
  • 0 kudos

The CDC Logs from AWS DMS not apply correctly

I have a dms task that processing the full-load and replication ongoing tasksfrom source (MSSQL) to target (AWS S3)then use delta lake to handle the CDC logsI've a notebook that would insert data into mssql continuously (with id as primary key)then d...

204293406-01bf6cc1-bb6f-42bb-9bfe-e9b1f5135ae9[1]
  • 706 Views
  • 0 replies
  • 0 kudos
apayne
by New Contributor III
  • 1497 Views
  • 1 replies
  • 4 kudos

Databricks Jobs API not returning notebook run results?

Calling a databricks notebook using the Rest API, can confirm that it is executing the notebook, but is not accepting my parameters or returning a notebook output. Any ideas on what I am doing wrong here?My code and notebook function are below, tryin...

view view2
  • 1497 Views
  • 1 replies
  • 4 kudos
Latest Reply
apayne
New Contributor III
  • 4 kudos

Resolved this by using dbutils within the notebook being called from the API.# databricks notebook function   data = dbutils.widgets.get('data') # pulls base_parameters from API call   def add_test(i): result = i + ' COMPLETE' return result  ...

  • 4 kudos
Swapnil1998
by New Contributor III
  • 512 Views
  • 0 replies
  • 0 kudos

How to query a MySQL Table from Databricks?

I wanted to query a MySQL Table using Databricks rather than reading the complete data using a dbtable option, which will help in incremental loads.remote_table = (spark.read .format("jdbc") .option("driver", driver) .option("url", URL) .option("quer...

  • 512 Views
  • 0 replies
  • 0 kudos
Harish14
by New Contributor III
  • 825 Views
  • 4 replies
  • 1 kudos

Hi @Kaniz Fatma​ and @Nadia Elsayed​ , i have taken databricks data engineer associate exam on nov 27th . in result mail it is mentioned i have obtain...

Hi @Kaniz Fatma​ and @Nadia Elsayed​ ,i have taken databricks data engineer associate exam on nov 27th . in result mail it is mentioned i have obtained below 70% in assessment but as per the section wise results i have gained more than 70% . Can you ...

  • 825 Views
  • 4 replies
  • 1 kudos
Latest Reply
Nadia1
Honored Contributor
  • 1 kudos

Hello Harish - I have responded via email. Thank you

  • 1 kudos
3 More Replies
Bartek
by Contributor
  • 2601 Views
  • 3 replies
  • 7 kudos

Resolved! Number of partitions in Spark UI Simulator experiment

I am learning how to optimize Spark applications with experiments from Spark UI Simulator. There is experiment #1​596 about data skew and in command 2 there is comment about how many partitions will be set as default:// Factor of 8 cores and greater ...

obraz
  • 2601 Views
  • 3 replies
  • 7 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 7 kudos

Hi @Bartosz Maciejewski​ Generally we arrive at the number of shuffle partitions using the following method.Input Size Data - 100 GBIdeal partition target size - 128 MBCores - 8Ideal number of partitions = (100*1028)/128 = 803.25 ~ 804To utiltize the...

  • 7 kudos
2 More Replies
Labels
Top Kudoed Authors