Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am aware, I can load anything into a DataFrame using JDBC, that works well from Oracle sources. Is there an equivalent in Spark SQL, so I can combine datasets as well?Basically something like so - you get the idea...select
lt.field1,
rt.fie...
Hi everyone,Our company is using Databricks on GKE. It works fine until suddenly when we try to create and terminate clusters today, it got stuck on Pending and Terminating state for hours (now more than 6 hours). There is no conclusion can be drawn ...
Hi @Kurnianto Trilaksono Sutjipto : Figured out after multiple connects that This is typically a cloud provider issue. You can file a support ticket if the issue persists.
Hello! Is there an equivalent of Create trigger on a table in Databricks sql?CREATE TRIGGER [schema_name.]trigger_nameON table_nameAFTER {[INSERT],[UPDATE],[DELETE]}[NOT FOR REPLICATION]AS{sql_statements}Thank you in advance!
You can try Auto Loader: Auto Loader supports two modes for detecting new files: directory listing and file notification.Directory listing: Auto Loader identifies new files by listing the input directory. Directory listing mode allows you to quickly ...
Im working with the sample notebook named '1_Customer Lifetimes.py' in https://github.com/databricks-industry-solutions/customer-lifetime-valueIn notebook, there is the code like this `%run "./config/Data Extract"`This load excel data however it occu...
@Seungsu Lee It could be a destination host issue, configuration issue or network issue.Hard to guess, first check if your cluster has an access to the public internet by running this command:%sh ping -c 2 google.com
Problem Statement:We have a scenario where we get the data from the source in the format of (in actual 20 Levels and number of fields are more than 4 but for ease of understanding let’s consider below)The actual code involved 20 levels of 4-5 fields ...
I have set up a DLT with "testing" set as the target database. I need to join data that exists in a "keys" table in my "beta" database, but I get an AccessDeniedException, despite having full access to both databases via a normal notebook.A snippet d...
As an update to this issue: I was running the DLT pipeline on a personal cluster that had an instance profile defined (as per databricks best practises). As a result, the pipeline did not have permission to access other s3 resources (e.g other databa...
Hi Fellas - I'm trying to load parquet data (in GCS location) into Postgres DB (google cloud) . For bulk upload data into PG we are using (spark-postgres library)https://framagit.org/interhop/library/spark-etl/-/tree/master/spark-postgres/src/main/sc...
Hi @Kaniz Fatma , @Daniel Sahal - Few updates from my side.After so many hits and trials , psycopg2 worked out in my case.We can process 200+GB data with 10 node cluster (n2-highmem-4,32 GB Memory, 4 Cores) and driver 32 GB Memory, 4 Cores with Run...
Hello,Apologize for dumb question but i'm new to Databricks and need clarification on following.Are parallel and subsequent jobs able to reuse the same compute resources to keep time and cost overhead as low as possible vs. are they spinning a new cl...
@tanja.savic tanja.savic You can use shared job cluster:https://docs.databricks.com/workflows/jobs/jobs.html#use-shared-job-clustersBut remember that a shared job cluster is scoped to a single job run, and cannot be used by other jobs or runs of the...
Hi Team ,Can we call the dashboard from another dashboard? An example screenshot is attached.Main Dashboard has 3 buttons that point to 3 different dashboards and if we click any of the buttons it has to redirect to the respective dashboard.
@Janga Reddy I don't think that this is possible at this moment.You can raise a feature request here: https://docs.databricks.com/resources/ideas.html
I have pandas_udf, its working for 1 rows, but I tried with more than one rows getting below error.PythonException: 'RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output w...
I was testing, and your function is correct. So you need to have an error in inputData type (is all string) or with result_json. Please also check the runtime version. I was using 11 LTS.
Hi all.I am trying to export R data frame variable as csv file.I am using this formula:df<- data.frame(VALIDADOR_FIM)df.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").save("dbfs:/FileStore/df/df.csv")But isn´t working. ...
I am using Databricks Autoloader to load JSON files from ADLS gen2 incrementally in directory listing mode. All source filename has Timestamp on them. The autoloader works perfectly couple of days with the below configuration and breaks the next day ...
Hi Everyone,I'm seeing this issue as well - same configuration of the previous posts, using autoloader with incremental file listing turned on. The strange part is that it mostly works despite almost all of the files we're loading having colons incl...
The azure event hub "my_event_hub" has a total of 5 partitions ("0", "1", "2", "3", "4")The readstream should only read events from partitions "0" and "4"event hub configuration as streaming source:-val name = "my_event_hub"
val connectionString = "m...
I tried using below snippet to receive messages only from partition id=0ehName = "<<EVENT-HUB-NAME>>"
# Create event position for partition 0
positionKey1 = {
"ehName": ehName,
"partitionId": 0
}
eventPosition1 = {
"offset": "@latest",
...
i just want to add color to excel sheet by python to specific cells, and i done that, but i need to exclude the header column, then if i tried the same method to other sheet it doesn't worked.but that bg color addition is reflected in one sheet but...
Convert your dataframe to pandas on sparkcolor cells using style property https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.style.htmlexport to excel using pandas to_excel https://spark.apache.org/d...