cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

joseph_sf
by New Contributor
  • 647 Views
  • 1 replies
  • 0 kudos

Implement Delta tables optimized for Databricks SQL service

This question is on  the Databricks Certified Data Engineer Professional exam in section 1: "Implement Delta tables optimized for Databricks SQL service"I do not understand what is being asked by this question. i would assume that their different way...

  • 647 Views
  • 1 replies
  • 0 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 0 kudos

Hi @joseph_sf , I assume you are referring to the exam guide PDF file. As you assumed, there are different techniques to optimize a Delta table. Some of them are already mentioned in the other bullet points in the same section 1, such as partitioning...

  • 0 kudos
drollason
by New Contributor II
  • 292 Views
  • 1 replies
  • 1 kudos

Resolved! Issue with UDF's and DLT where UDF is multi layered and externalized

Having issue getting UDF's to work within a DLT where the UDF is externalized outside of the notebook and it attempts to call other functions.  End goal to put unit test coverage around the various functions, hence the pattern. For test purpose I cre...

  • 292 Views
  • 1 replies
  • 1 kudos
Latest Reply
bgiesbrecht
New Contributor III
  • 1 kudos

Hi @drollason. In DLT pipelines, I would try packaging up your code as a wheel and then install it via pip. I had the same scenario as you and was able to bring in my custom code this way.

  • 1 kudos
nolanreilly
by New Contributor II
  • 1012 Views
  • 1 replies
  • 1 kudos

Impossible to read a custom pipeline? (Scala)

I have created a custom transformer to be used in a ml pipeline. I was able to write the pipeline to storage by extending the transformer class with DefaultParamsWritable. Reading the pipeline back in however, does not seem possible in Scala. I have...

  • 1012 Views
  • 1 replies
  • 1 kudos
Latest Reply
WarrenO
New Contributor III
  • 1 kudos

Hi, did you ever find a solution for this?

  • 1 kudos
NaeemS
by New Contributor III
  • 1233 Views
  • 2 replies
  • 4 kudos

Custom transformers with mlflow

Hi Everyone,I have created a spark pipeline in which I have a stage which is a Custom Transformer. Now I am using feature stores to log my model. But the issue is that the custom Transformer stage is not serialized properly and is not logged along wi...

  • 1233 Views
  • 2 replies
  • 4 kudos
Latest Reply
WarrenO
New Contributor III
  • 4 kudos

Hi @NaeemS,Did you ever get a solution to this problem? I've now encountered this myself. When I save the pipeline using ML Flow log_model, I am able to load the model fine. When I log it with Databricks Feature Engineering package, it throws an erro...

  • 4 kudos
1 More Replies
Unimog
by New Contributor III
  • 669 Views
  • 5 replies
  • 0 kudos

Resolved! Insert Into SQLServer Table

I'm trying to insert and update data in an SQLServer table from a python script.  No matter what I try, it seems to give me this error:The input query contains unsupported data source(s). Only csv, json, avro, delta, kafka, parquet, orc, text, unity_...

  • 669 Views
  • 5 replies
  • 0 kudos
Latest Reply
Nivethan_Venkat
Contributor II
  • 0 kudos

Hi @Unimog,Currently the support for data sources are limited to as mentioned in the General Limitations for serverless compute as of now: General Serverless Limitations Support for data sources is limited to AVRO, BINARYFILE, CSV, DELTA, JSON, KAFKA...

  • 0 kudos
4 More Replies
tommyhmt
by New Contributor II
  • 1467 Views
  • 2 replies
  • 0 kudos

Delta Live Table missing data

Got a very simple DLT which runs fine, but the final table "a" is missing data.I've found that after goes through a full refresh, if I rerun just the final table, then I get more records (from 1.2m to 1.4m) and the missing data then comes back.When I...

tommyhmt_0-1730972149476.png tommyhmt_1-1730972356391.png
  • 1467 Views
  • 2 replies
  • 0 kudos
Latest Reply
NSonam
New Contributor II
  • 0 kudos

To me it seems like timing or dependency issue. The missing data could be due to intermediate tables are not being properly refreshed or triggered during the full refresh. Please check if intermediate tables are being loaded properly before it start ...

  • 0 kudos
1 More Replies
nwong
by New Contributor II
  • 864 Views
  • 5 replies
  • 1 kudos

Error creating Unity Catalog external table

I tried creating an external table from a partitioned parquet folder in Unity Catalog. Initially, I created the table from the Data Ingestion UI. It worked but only a tiny portion of the table was actually loaded. Next, I tried running a SQL DDL CREA...

  • 864 Views
  • 5 replies
  • 1 kudos
Latest Reply
royvansanten
New Contributor II
  • 1 kudos

You can use recursiveFileLookup in OPTIONS, as shown in this topic: https://community.databricks.com/t5/data-engineering/external-table-from-external-location/td-p/69246

  • 1 kudos
4 More Replies
kolangareth
by New Contributor III
  • 6890 Views
  • 11 replies
  • 3 kudos

Resolved! to_date not functioning as expected after introduction of arbitrary replaceWhere in Databricks 9.1 LTS

I am trying to do a dynamic partition overwrite on delta table using replaceWhere option. This was working fine until I upgraded the DB runtime to 9.1 LTS from 8.3.x. I am concatenating 'year', 'month' and 'day' columns and then using to_date functio...

  • 6890 Views
  • 11 replies
  • 3 kudos
Latest Reply
ltreweek
New Contributor II
  • 3 kudos

SELECT TO_DATE('20250217','YYYYMMDD'); gives the error: PARSE_SYNTAX_ERROR  syntax error at or near 'select'. sqlstate: 42601.  It datagrip, it works no problem and displays the date.

  • 3 kudos
10 More Replies
kertsman_nm
by New Contributor
  • 902 Views
  • 0 replies
  • 0 kudos

Trying to use Broadcast to run Presidio distrubuted

Hello,I am currently evaluating using Microsoft's Presidio de-identification libraries for my organization and would like to see if we can take advantage to Sparks broadcast capabilities, but I am getting an error message:"[BROADCAST_VARIABLE_NOT_LOA...

  • 902 Views
  • 0 replies
  • 0 kudos
SamGreene
by Contributor II
  • 2720 Views
  • 10 replies
  • 0 kudos

Use Azure Service Principal to Access Azure Devops

There is another thread marked as answered, but it is not a working solution: Solved: How to use Databricks Repos with a service princip... - Page 2 - Databricks Community - 11789In Azure Devops, there doesn't seem to be a way to generate a PAT for a...

  • 2720 Views
  • 10 replies
  • 0 kudos
Latest Reply
KrunalG
New Contributor II
  • 0 kudos

what exactly is the "databricks_token" that you are using? If it's a personal access token generated using some user account again, I don't think you are solving the problem Sam is facing. 

  • 0 kudos
9 More Replies
mo_moattar
by New Contributor III
  • 5377 Views
  • 3 replies
  • 1 kudos

Is anyone knows how to use python logger in Databricks python job on serverless

I'm trying to use the standard Python logging framework in the Databricks jobs instead of print. I'm doing this by using spark._jvm.org.apache.log4j.LogManager.getLogger(__name__)however as I'm running this on serverless, I get the following error [J...

  • 5377 Views
  • 3 replies
  • 1 kudos
Latest Reply
lprevost
Contributor II
  • 1 kudos

I get asynchio errors and it crashes notebook/python with @mo_moattar approach. This is something DBRX needs to provide some guidance on.  I am very unsure how to do logging on Serverless.

  • 1 kudos
2 More Replies
ConfusedZebra
by New Contributor II
  • 510 Views
  • 3 replies
  • 0 kudos

HTML form within notebooks

Hi all,I'm trying to make a small form in Databricks notebooks. I can't currently use apps so want an interim solution. I can successfully make the form using HTML which displays correctly but I cannot extract the values/use them e.g. A form with thr...

  • 510 Views
  • 3 replies
  • 0 kudos
Latest Reply
ConfusedZebra
New Contributor II
  • 0 kudos

Thanks both. Notebooks are a little too intimidating for some users so we are trying to make them look and feel a bit more like what they are used to. Ideally we would build an app but apps aren't available in our area yet so we need an interim solut...

  • 0 kudos
2 More Replies
jura
by New Contributor II
  • 2085 Views
  • 3 replies
  • 1 kudos

SQL Identifier clause

Hi, I was trying to prepare some dynamic SQLs to create table using the IDENTIFIER clause and WITH AS clause, but I'm stuck on some bug as it seems. could someone verify it or tell me that I am doing something wrong?code is running on SQL Warehouse T...

jura_2-1710922868633.png jura_3-1710923081107.png jura_4-1710923152252.png
Data Engineering
identifier
  • 2085 Views
  • 3 replies
  • 1 kudos
Latest Reply
vinay_yogeesh
New Contributor II
  • 1 kudos

Hey, I am struck with the same issue, did you find any workaround. I am trying to run DESCRIBE & ALTER command using IDENTIFIER() using databricks-sql-connector. Did u figure out how to run the identifier command statements??

  • 1 kudos
2 More Replies
lprevost
by Contributor II
  • 538 Views
  • 3 replies
  • 0 kudos

Using WorkspaceClient -- run a saved query

I've saved a query on my sql warehouse which has a parameter called :list_parameter.   I've found my query id as follows:  from databricks.sdk import WorkspaceClient w = WorkspaceClient() for query in w.queries.list(): print(f"query: {query.displ...

  • 538 Views
  • 3 replies
  • 0 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 0 kudos

Hi @lprevost  The WorkspaceClient provides APIs to manage Query objects. But it doesn't provide the API to run it. If you need to run the query from a notebook, you can pass the query text into `spark.sql`. It returns SparkDataFrame. I hope this help...

  • 0 kudos
2 More Replies
JothyGanesan
by New Contributor III
  • 665 Views
  • 3 replies
  • 1 kudos

Resolved! Streaming data - Merge in Target - DLT

We have streaming inputs coming from streaming tables and also the table from apply_changes.In our target there is only one table which needs to be merged with all the sources. Each source provides different columns in our target table. Challenge: Ev...

  • 665 Views
  • 3 replies
  • 1 kudos
Latest Reply
JothyGanesan
New Contributor III
  • 1 kudos

Thank you this worked

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels